add epa land suitability data

aldivi commented 4 years ago

The task is to replace the current protected vs non-protected land designations with a set of suitability classes. This requires:

Read in the epa data and get in onto the working grid - this will require addition of the gdal library and the creation of aggregated data arrays representing fractions of land area in each working grid cell assigned to each suitability class. These fractional data should be determined so that each class represents an exclusive fraction and that the fractions add up to 1 for land area. These fractional data can be stored in one grid array for each class, in place of the single protected area thematic array. The current read function to be replaced is read_protected.c

The epa data are organized differently than desired. Each file has binary values denoting whether the pixel meets the criteria or not. The base layer L1 simply denotes suitable or not-suitable, regardless of any other designation. The other layers subtract or add pixels from this layer, so to get to the desired format the epa layers will have to be differenced in appropriate ways. Note that the protected area data are not available for the not-suitable class; to get these data we would have to get the actual IUCN protected area data and preprocess it and read it in. I discussed this with Page and we decided to just go with the available epa data for now.

There will be five distinct classes, rather than the two current ones. The function proc_land_type_area.c is where these classes are assigned to areas that are aggregated to land types within countryXglu boundaries. There are 10 possible values for classes (0-9).Currently, non-protected (i.e., suitable) is assigned a value of 2, and protected is assigned a value of 1. The new classes constitute a mutually exclusive set and will be as follows: 1 - not suitable 2 - suitable and not protected 3 - suitable and iucn protected categories 1a, 1b, and 2 that have not been deforested 4 - suitable and iucn protected categories 1a, 1b, and 2 that have been deforested 5 - suitable and iucn protected categories 3, 4, and 5 We will recommend that the default suitability for GCAM is class 2, which is the highest level of protection (least amount of suitable land). The default EPA suitable layer is class 2 plus class 4 plus class 5 because they don't expect iucn categories 3-5 to really be protected and they add in the deforested protected area as suitable because it has been deforested.

The areas for output will have to be calculated slightly differently because now there will be fractional land area values for suitability categories instead of the current whole pixel assignment. In other words, rather than just adding whole pixel land area to one or the other for a given land type, each land type will be distributed to the 5 suitability classes based on the fractional values for each pixel. The output structure will stay the same, as these classes fit into the single digit code, and so the category mapping and number of output records will be expanded.

Note that proc_land_type_area.c takes at least half the run time of the total program, so this function will be the real efficiency test for porting to R.

aldivi commented 4 years ago

update on the structure proposed above: Let's also read in the original protected area to get one additional category: 6 - not suitable and protected Then category one is: 1 - not suitable and not protected

This allows for addressing land available (or not) for management in other land types, which may not be suitable for agriculture.

This is also a place holder for more up to date protected area data for determining this distinction. If EPA does not have these data available, the shapefiles would have to be downloaded from IUCN and then preprocessed for use my moirai. Although with gdal, it may be possible to read in the shapefiles and convert and preprocess them within the moirai system.

aldivi commented 4 years ago

EPA has also provided their IUCN protected area data, which are in two files: one has categories 1a, 1b, and 2; and the other has all five categories. So now we can add these two categories instead of the one mentioned above: 6 - not suitable and protected 1a,1b,2 7 - not suitable and protected 3,4,5

I will have to contact their source data providers to inquire about redistribution permissions.

aldivi commented 4 years ago

I forgot to mention: it appears the the IUCN protected area layers enumerate each protected area, and ocean protected areas are included. all other values are nodata, which i am not entirely clear on at the moment. the geotiff supposedly has a nodata mask rather than a value. r recognizes it through rgdal, and so should c gdal.

aldivi commented 4 years ago

the current epa raster layers to not line up. L1, L2, and L3 are consistent, and all_iucn is consistent with L4 (which indirectly makes it consistent with L1). Aaron Sobel is looking into this in case they have another version. In the meantime, through the aggregation to 5 arcmin as fractions, we can still create the desired categories (see below), although it will require some error checking for negative fractions. A check for summing to 1 should also be included, both before and after adjusting for negative fractions (it should always sum to 1 before due to the defined calculations). It appears that the negative fractions may occur only in categories 6 or 7, and that the fractions can be adjusted between these two categories only to fix the issue (by adding the negative value of one to the other). Which is fine, as these will likely be the least used ones, especially separately (the sum of 6 and 7 does not change in this adjustment). But I have tried only a few limiting cases. I think we may also need an unknown=0 category in case there are moirai land cells where no protected area data exist. Here are the combinations of the aggregated epa layers to get the moirai exclusive categories/classes fractions: 0 = moirai land where no protected area data exist 1 = 1.0 - all_iucn - L4 2 = L4 3 = L1 - L3 4 = L3 - L2 5 = L2 - L4 6 = iucn_1a_1b_2 - L1 + L2 7 = all_iucn - L2 + L4

Recall that these are fractions of grid cell. Categories 2-5 represent land only (these are the ag suitable classes), while 1, 6, and 7 may include water, ice, rock, etc. The total suitable area (sum 2-5) needs to be checked against the land area in a cell. If there is not enough land area, categories 2-5 fractions need to be scaled down (by ratio), and then the difference for each category needs to be added respectively to 1, 6, and 7 (transfer differences: 2->1, 3->6, 4->6, 5->7), and the new total ensured to be 1. Then, when calculating area the fractions for 2-5 can be applied to grid cell area and the values will be consistent with land area. When calculating area for 1, 6, and 7, the remaining land area needs to be distributed among these proportionally to their fractions, with any remainder assumed to be water and not included. One uncertainty with this last distribution for 1, 6, and 7 is that we don't know whether the protected area is on land or water, so we assume it is distributed evenly (this is mainly an issue along coasts/shorelines).

kanishkan91 commented 4 years ago

@aldivi @crvernon Status update on this issue as of 17 Feb 2020,

We have now processed the EPA rasters using gdal (Processing done using command line code but also adding in commented out gdal code in the C file.
Computed fractions of land cells for each category and ensured that the final file can be effectively brought onto the working grid.
Checking that total fractions add up to 1 and don't go negative.
The current read_protected.c is equipped to read in the new processed rasters. But adding in gdal code as well so that we can track metadata.
Should be able to complete the updates to the read_protected.c function this week.

aldivi commented 4 years ago

Note the equation for category 7 above is incorrect. The correct equation is: 7=all_iucn - iucn_1a_1b_2 - L2 + L4

aldivi commented 4 years ago

Additional corrections for inconsistencies in the source epa data. These adjustments focus on iucn_1_2 data inconsistencies among the epa files (see above).

Once processed to the working grid with grid cell fractions for the six epa input files, two unphysical cases arise in some grid cells: 1) iucn_1_2 > all_iucn To correct for this case, the following adjustments are made: new_iucn_1_2 = all_iucn new_L2 = L2 + (iucn_1_2 - all_iucn) new_L3 = L3 + (iucn_1_2 - all_iucn)

2) L4 > L2 and iucn_1_2 <= all_iucn To correct for this case, the following adjustments are made: new_iucn_1_2 = iucn_1_2 - (L4 - L2) new_L2 = L2 + (L4 - L2) new_L3 = L3 + (L4 - L2)

Following these corrections, negative fractions should occur only in the processed categories 6 and 7, and these are corrected for by adjusting the category 6 and 7 fractions accordingly (set the negative value to zero and then subtract the absolute value of the negative fraction from the other category)

aldivi commented 4 years ago

I suggest that the fourth column in MOIRAI_land_types.csv be labelled: Status Status labels for the categories above: 0 = Unknown 1 = UnsuitableUnprotected 2 = SuitableUnprotected 3 = SuitableHighProtectionIntact 4 = SuitbaleHighProtectionDeforested 5 = SuitableLow Protection 6 = UnsuitableHighProtection 7 = UnsuitableLowProtection

JGCRI / moirai

add epa land suitability data #18