USGS-R / drb-do-ml

Code repo for Delaware River Basin machine learning models that predict dissolved oxygen
Creative Commons Zero v1.0 Universal
4 stars 4 forks source link

Initial catchment attribute data #6

Closed jsadler2 closed 2 years ago

jsadler2 commented 3 years ago

We need to catalog which catchment attributes we have available in the DRB

jsadler2 commented 3 years ago

closing b/c we already have a discussion on this #4

jsadler2 commented 3 years ago

Okay. I'm reopening so the discussion can be more focused on each type of data haha. IDK which is better, but let's go with focused

jsadler2 commented 3 years ago

PRMS included some static attributes for each HRU (catchment). These are combined and matched to PRMS segments in the https://github.com/USGS-R/delaware-model-prep pipeline. Below is a metadata file for those attributes:

combined_metadata.csv

jsadler2 commented 3 years ago

These attributes are on science base and can be found through this site: https://wwwbrr.cr.usgs.gov/projects/SW_MoWS/GeospatialFabric.html

jsadler2 commented 3 years ago
Here are those attributes in table format (same as in the above CSV file) attribute_label attribute_definition attribute_units
soil_moist_max The maximum possible depth of moisture, expressed in inches, that can be held in the feature's upper soil compartment. This moisture can seep downward, but can also be extracted by both evaporation and transpiration. For every cell in the supplied raster map of soils data, the available water holding capacity (a rate expressed as inches of water per inch of soil) is multiplied by the rooting depth. The soil_moist_max value is the average of this depth for the feature. Inches
soil_rechr_max The maximum possible depth of moisture available for recharge, expressed in inches, that can be held in the feature's upper soil compartment. This moisture can seep downward, but can also be extracted by both evaporation and transpiration. For every cell in the supplied raster map of soils data, the available water holding capacity (a rate expressed as inches of water per inch of soil) is multiplied by the rooting depth or 18 inches, which ever is smaller. The soil_rechr_max value is the average of this depth for the feature. Inches
soil_type Soil type designation.Texture is first designated for each cell in the input raster map of soils data. 3 is assigned if clay content exceeds 40 percent. 1 is then assigned to any cell with less than 40% clay but more than 50% sand content. The remaining cells are assinged a code of 2. soil_type for the feature indicates the most commonly occuring (per-cell) soil type. nan
flulc Fraction of the feature area that is underlain by the land cover data layer used to derive the soil type parameters. Value can range from 0.0-1.0, with a flag of -1 for no overlap at all. Percent
cov_type Land cover type designation. Cover type is first designated for each cell in the input raster map of land cover data. This is done by reclassifying the input land cover raster data set. As a quality assurance measure, any cells whose elevation exceeds timberline (set to 11,500 ft) and have shrub or tree designations are reset to grass (code 2). cov_type is set to 0 for a feature if the cover type cells within the feature aremore than 90% bare. If move than 20% of the feature is covered by tree cells, then cov_type is set to 3. For features that are unassigned a cov_type value by the preceding, a code of 2 is assigned to any feature with more than 20% shrub cover. Features that are still unassigned but with a combination of shrub and tree cover that exceeds 35% are then assigned a value. cov_type code 3 is assigned to features within this set with more tree than shrub area, with the remainder being assigned a value of 2. Any features still not designated with a cov_type are assigned according to the more commonly occuring (per-cell) cov_type within the feature. The final step is to re-assign the cov_type for any feature whose hru_elev value exceeds timberline (11,500 feet) and whose cov_type value is 2 or 3 to 1. nan
covden_sum The percentage of the land surface within the feature that is shaded by vegetation when illuminated from directly above. This is normally derived from remotely sensed images of canopy density. Percent
covden_win The percentage of the land surface within the feature that is shaded by vegetation when illuminated from directly above. This is normally derived from remotely sensed images of canopy density, but with a reduction based on type of tree (deciduous or coniferous) to account for leaf loss in the winter months. Percent
rad_trncf Radiation transmission coefficient. A per-cell surface indicating the winter vegetation density where the cov_type value is 3 ("tree") is created. The winter vegetation density is the summer vegetation density reduced by a "leaf keep" factor based on the prms cov_type. The following list shows pairings of cov_type and the percent of summer vegetation density to maintain. [[0, 0], [1, 80], [2, 70], [3, 60], [4, 100]]. nan
srain_intcp Canopy interception of summer precipitation. Expressed as a depth in inches. A land cover layer, defined in the "The Data_Bin" section of (p. 74, Viger and Leavesley, 2007), is reclassified according to the following scheme, which shows pairings of land cover codes and an integer representing how many hundreths of an inch of precip can be stored for that land cover. Inches
[0, 0], [1, 5], [2, 5], [3, 5], [4, 5],
[5, 5], [6, 5], [7, 5], [8, 5], [9, 5], [10, 5], [11, 5],
[12, 5], [13, 5], [14, 5], [15, 5], [16, 5], [17, 5],
[18, 5], [19, 5], [20, 5], [21, 5], [22, 5], [23, 0],
[101, 0], [102, 2], [103, 2], [104, 2], [105, 2],
[106, 5], [107, 5], [108, 5], [109, 5], [110, 5],
[111, 5], [112, 5], [113, 5], [114, 5], [115, 5],
[116, 5], [117, 5], [118, 0], [119, 5], [120, 5],
[121, 0], [122, 5], [123, 5], [124, 0], [125, 2],
[126, 2], [127, 0]]
wrain_intcp Canopy interception of winter precipitation. Expressed as a depth in inches. A land cover layer, defined in the "The Data_Bin" section of (p. 74, Viger and Leavesley, 2007), is reclassified according to the following scheme, which shows pairings of land cover codes and an integer representing how many hundreths of an inch of precip can be stored for that land cover. Inches
[[0, 0], [1, 5], [2, 5], [3, 5], [4, 5], [5, 3],
[6, 2], [7, 2], [8, 2], [9, 2], [10, 2], [11, 5], [12, 5],
[13, 5], [14, 5], [15, 5], [16, 2], [17, 5], [18, 5], [19, 2],
[20, 5], [21, 2], [22, 2], [23, 0], [101, 0], [102, 2],
[103, 2], [104, 2], [105, 2], [106, 5], [107, 5], [108, 5],
[109, 5], [110, 5], [111, 5], [112, 2], [113, 5], [114, 5],
[115, 3], [116, 2], [117, 3], [118, 0], [119, 5], [120, 5],
[121, 0], [122, 5], [123, 5], [124, 0], [125, 2], [126, 2], [127, 0]]
snow_intcp Canopy interception of snow. Expressed as a depth in inches. A land cover layer, defined in the "The Data_Bin" section of (p. 74, Viger and Leavesley, 2007), is reclassified according to the following scheme, which shows pairings of land cover codes and an integer representing how many hundreths of an inch of precip can be stored for that land cover. Inches
[[0, 0], [1, 10], [2, 10], [3, 10], [4, 10],
[5, 7], [6, 2], [7, 2], [8, 2], [9, 2], [10, 2], [11, 10],
[12, 10], [13, 10], [14, 10], [15, 10], [16, 2], [17, 10],
[18, 10], [19, 5], [20, 10], [21, 2], [22, 2], [23, 0],
[101, 0], [102, 0], [103, 0], [104, 0], [105, 0], [106, 3],
[107, 0], [108, 2], [109, 2], [110, 2], [111, 2], [112, 2],
[113, 10], [114, 10], [115, 7], [116, 2], [117, 3], [118, 0],
[119, 2], [120, 10], [121, 0], [122, 2], [123, 2], [124, 0],
[125, 0], [126, 0], [127, 0]]
hru_percent_imperv Percentage of feature area that is covered by impervious surfaces. Percent
felev Fraction of the feature area that is underlain by the elevation data layer used to derive the soil type parameters. Value can range from 0.0-1.0, with a flag of -1 for no overlap at all. Percent
hru_elev The median elevation of the feature, expressed in meters, determined from an overlay analysis of the cells in a digital elevation model. Feet
hru_slope The mean slope of the feature, expressed as percent rise (over run), determined from an overlay analysis of the cells in a digital elevation model. Slope for the individual cells was deteremined as described in http://resources.arcgis.com/en/help/main/10.1/index.html#//009z000000v2000000. Percent
hru_aspect The mean orientation (aspect) of the predominant downslope direction of the feature, expressed as (0-360) degrees clockwise from north. For all cells in the Digital Elevation Model, aspect is calculated as described in http://resources.arcgis.com/en/help/main/10.1/index.html#/Aspect/009z000000tr000000/. Then the trigonometric sine and cosine of each cell's aspect are derived to create two new rasters. The average value for both of these figures is determined for each feature. The hru_aspect value is then set to the inverse tangent of these two figures (atan2(sin(aspect), cos(aspect)). Degrees of orientation clockwise from north
Slope for the individual cells was deteremined as described in http://resources.arcgis.com/en/help/main/10.1/index.html#//009z000000v2000000.
jh_coef_hru Jensen-Haise coefficient, expressed in degrees Fahrenheit, is calculated using formula speficied in Leavesley, G.M., PRMS User's Manual. Defaults taken from N.J. Roesenberg, B.L. Blad, S.B. Verma, Microclimate: The Biological Environment, John Wiley & Sons, Inc, 1983, p. 170. Calculation of saturation vapor pressures for this process is based on the two temperatures, 7 and 25 degrees Fahrenheit . The coefficient assumes a lapse rate of 3.280840 degrees Fahrenheit for every 1000 feet of elevation gain. Degrees Fahrenheit
tmin_adj An adjustment factor for the minimum daily temperature associated with the feature, expressed in Fahrenheit. Aspect cells in the DEM are reclassified into adjustment values. tmin_adj is calculated as the mean of the cell-based adjustment values for the feature. Degrees Fahrenheit
tmax_adj An adjustment factor for the minimum daily temperature associated with the feature, expressed in degrees Fahrenheit. Aspect cells in the DEM are reclassified into adjustment values. tmin_adj is calculated as the mean of the cell-based adjustment values for the feature. Degrees Fahrenheit
snarea_thresh Snow area threshold, expressed as a ratio, for a feature is compuated as the quantity of the feature hru_elev less the region minimum hru_elev, multiplied by 5 and expressed as feet. This value is then divided by 1000 feet. Ratio
hru_deplcrv Snow depletion curve number, an identifier, corresponds to one of two empically derived curves. For features with hru_elev exceeding timber line, which is set at 11,500 feet above sea level, curve number 2 is designated. All other features are assigned to curve number 1. nan
hru_long Longitude coordinate of the feature centroid. Note that centroid is always within the feature boundaries and not necessarily at the geometric centroid of the feature. Decimal Degrees
hru_x X coordinate of the feature centroid. This value is expressed using the internal coordinate system of the Geospatial Fabric Features, which is USA_Contiguous_Albers_Equal_Area_Conic_USGS_version (WKID: 102039 Authority: ESRI). Note that centroid is always within the feature boundaries and not necessarily at the geometric centroid of the feature. Meters
hru_y Y coordinate of the feature centroid. This value is expressed using the internal coordinate system of the Geospatial Fabric Features, which is USA_Contiguous_Albers_Equal_Area_Conic_USGS_version (WKID: 102039 Authority: ESRI). Note that centroid is always within the feature boundaries and not necessarily at the geometric centroid of the feature. Meters
hru_lat Latitude coordinate of the feature centroid. Note that centroid is always within the feature boundaries and not necessarily at the geometric centroid of the feature. Decimal Degrees
hru_area The area of the feature, expressed in acres. Acres
dprst_area Depression storage area, expressed in acres, is the area within the feature that is covered by a waterbody. Acres
sro_to_dprst Percent of hru_area with pervious land cover whose runoff flows into surface depressions prior to leaving the HRU. This is determined by topographic analysis of the DEM and a GIS map of waterbodies, as well as a map of impervious land cover. Percent
soil2gw_max Maximum value of soil-water excess routed directly to PRMS ground-water reservoir (Markstrom and others, 2008). Initially derived as the cubed power of the feature average hydraulic conductivity of near surface geology. Values for all features are then linearly scaled to fit within a range (specified below). Inches
fastcoef_lin Linear flow-routing coefficient for fast interflow (Markstrom and others, 2008). Initially derived as twice the slowcoef_lin value. Values for all features are then linearly scaled to fit within a range (specified below) . Per-day
slowcoef_lin Linear flow-routing coefficient for slow interflow (Markstrom and others, 2008). Initially derived as the hru_slope times the feature average hydraulic conductivity of near surface geology, divided by the hru_area. Values for all features are then linearly scaled to fit within a range (specified below). Per day
gwflow_coef Linear coefficient to route water in ground-water reservoir to streams (Markstrom and others, 2008). Initially derived as the hru_slope times the feature average hydraulic conductivity of near surface geology, divided by the hru_area. Values for all features are then linearly scaled to fit within a range (specified below). Per day
dprst_seep_rate_open Depression storage seepage rate. Initially set to equal the slowcoef_lin value. Values for all features are then linearly scaled to fit within a range (specified below) . Per day
dprst_se_rate_closed Depression storage seepage rate. Initially set to equal the slowcoef_lin value. Values for all features are then linearly scaled to fit within a range (specified below) . Per day
dprst_flow_coef Depression storage seepage rate. Initially set to equal the slowcoef_lin value. Values for all features are then linearly scaled to fit within a range (specified below) . nan
fflux Fraction of the feature area that is underlain by the hydrogeology data layer used to derive the groundwater flux-type parameters. Value can range from 0.0-1.0, with a flag of -1 for no overlap at all. Percent
aappling-usgs commented 3 years ago

I think the pipeline for those attributes is 20_catchment_attributes/ and the Snakemake file in that repo (delaware-model-prep), right? I tried it once while you were out, Jeff, and was stymied by references to 10_spatial_data/out/sntemp_subset_ids.csv and other files in that folder, which don't exist and lack Snakemake instructions in this repo. Might this be a good time to put a little more into making those prepared attributes fully reproducible?

jsadler2 commented 3 years ago

Yes. They are in that. And sounds like a good idea. Thanks for the motivation.

jsadler2 commented 3 years ago

The other big one which has been brought up is streamcat: https://www.epa.gov/national-aquatic-resource-surveys/streamcat-metrics-and-definitions. The catch here is that these are on the NHD reaches where the prepped met data (see https://github.com/USGS-R/drb-do-ml/issues/5#issuecomment-955067932) and adjacency data are on the PRMS reaches.

jsadler2 commented 3 years ago

Blodgett introduced me to this one: https://www.sciencebase.gov/catalog/item/5669a79ee4b08895842a1d47 which seems similar to streamcat. Not sure if they relate.

lekoenig commented 3 years ago

Blodgett introduced me to this one: https://www.sciencebase.gov/catalog/item/5669a79ee4b08895842a1d47 which seems similar to streamcat. Not sure if they relate.

This dataset looks really useful in that it provides a bunch of covariates processed and paired to NHDV2 flowline reaches (similar to EPA StreamCat). These data would require cross-walk from NHDV2 reaches to PRMS framework, though.