USGS-R / regional-hydrologic-forcings-ml

Repo for machine learning models for regional prediction of hydrologic forcing functions. Includes probabilistic seasonal high flow regions for CONUS, and prediction of high flow metrics for selected regions.
Creative Commons Zero v1.0 Universal
0 stars 4 forks source link

Create long-term average land cover features from SOHL and NLCD datasets #61

Closed jds485 closed 2 years ago

jds485 commented 2 years ago

Because we're working with long-term average flow statistics, I think it also makes sense to work with a single set of long-term average land cover data as features. The SOHL historical land cover years and NLCD years cover the range of years in our flow dataset, so an average should work well. Using an average will significantly reduce the number of land cover features from ~200 to 16.

SOHL data cover 1940 - 1990 in 10 year increments NLCD data cover 2001 - 2019 in 3 year increments. To be consistent with SOHL, we can use 2001, 2011, and 2019 for an approximate decadal timeseries.

Steps:

cstillwellusgs commented 2 years ago

Note to self - include functions for impervious cover averages (buffers and total) as well.

cstillwellusgs commented 2 years ago

Note to self - only 2011 NLCD data included in 'p1_sb_data_g2_csv' (CAT/ACC/TOT and with/without 50m riparian buffer). No other years in the dataset.

What about impervious percentages by year? Check as well.

cstillwellusgs commented 2 years ago

Note to self - NWALT not included - should it be added?

cstillwellusgs commented 2 years ago

Consider combining SOHL and NLCD (2001 and 2011) - would require reclassification of land covers.

https://www.usgs.gov/special-topics/land-use-land-cover-modeling/land-cover-modeling-methodology-fore-sce-model

Refer to PUMP table from Delaware project: https://github.com/USGS-R/drb-inland-salinity-ml/blob/main/1_fetch/in/Legend_FORESCE_Land_Cover.csv, https://github.com/USGS-R/drb-inland-salinity-ml/blob/main/1_fetch/in/Legend_NLCD_Land_Cover.csv.

cstillwellusgs commented 2 years ago

Notes so far for @jds485 (and potentially @ajsekell):

jds485 commented 2 years ago

Thanks for the update!

wetlands classes not included in FORESCE

Those classes are included in the more recent decadal FORESCE product we're using for PUMP. Strange that they wouldn't be in this product, too. We could reclassify as water as a quick fix.

COMIDs with non-zero values for NLCD class 12 (Perennial Ice/Snow)

That's okay to classify as NA given the low percent coverage

COMIDs do not have land covers that sum to 100% (+/- 1% for rounding)

We saw this in PUMP for NLCD for some COMIDs. Is this for NLCD?

reclassification

Did you use the PUMP reclassification table or a different method? Curious what you find here because one of our next steps is comparing the 2000 FORESCE and 2001 NLCD datasets for the DRB

ajsekell commented 2 years ago

Strange that the Sohl datset isn't including the wetlands classes. I'll have a look.

But on that note, I would not use both the Sohl and NLCD land cover datasets together, at least in terms of making direct comparisons or "continuing the timeline." I was not aware that we were doing that right now.

I think it's okay to use the mining class from Sohl since that's not a class in NLCD.

The different methods and the 250 m resolution for Sohl vs 30 m for NLCD are why there are drastic changes between the two from 1990 to 2001. Technically, you're not even supposed to compare between different releases of NLCD products (though the differences are much smaller).

Andrew Sekellick Physical Scientist USGS MD-DE-DC Water Science Center 443-498-5580 @.**@.>


From: Charles Stillwell @.> Sent: Friday, April 22, 2022 4:10 PM To: USGS-R/regional-hydrologic-forcings-ml @.> Cc: Sekellick, Andrew J @.>; Mention @.> Subject: [EXTERNAL] Re: [USGS-R/regional-hydrologic-forcings-ml] Create long-term average land cover features from SOHL and NLCD datasets (Issue #61)

This email has been received from outside of DOI - Use caution before clicking on links, opening attachments, or responding.

Notes so far for @jds485https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fjds485&data=05%7C01%7Cajsekell%40usgs.gov%7C1e0d6f498a604decb43908da249c27de%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637862550344576505%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Q2ku21qDIVL9VMGFJo%2F81e1rRVWuSh23VlEVW3nL%2BwI%3D&reserved=0 (and potentially @ajsekellhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fajsekell&data=05%7C01%7Cajsekell%40usgs.gov%7C1e0d6f498a604decb43908da249c27de%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637862550344576505%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=PDYek3uBDHjAG%2By8IzK5bbUYgyfsDk%2B%2F6kFGCtbUUWI%3D&reserved=0):

— Reply to this email directly, view it on GitHubhttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FUSGS-R%2Fregional-hydrologic-forcings-ml%2Fissues%2F61%23issuecomment-1106823549&data=05%7C01%7Cajsekell%40usgs.gov%7C1e0d6f498a604decb43908da249c27de%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637862550344576505%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P8I2r1TVDTiXUourdYDlHcsKRuoBtMJZba8NSdYy4LU%3D&reserved=0, or unsubscribehttps://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAV5VDZN7GZHKGXUMBVYYWQ3VGMBTLANCNFSM5PU3RHGA&data=05%7C01%7Cajsekell%40usgs.gov%7C1e0d6f498a604decb43908da249c27de%7C0693b5ba4b184d7b9341f32f400a5494%7C0%7C0%7C637862550344576505%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=NBhve%2BKgVkvyZRFTZtOKkUMrtUScbL4ZlvGM77E4enQ%3D&reserved=0. You are receiving this because you were mentioned.Message ID: @.***>

jds485 commented 2 years ago

Okay, in that case we can create a set of averages for the reclassified SOHL data and NLCD data separately. They can still be weighted based on the water years, but that can get complicated if a site has no data pre-2000 / post 2000. For now, I suggest we use a simple average and revisit weighting later. How does that sound?

cstillwellusgs commented 2 years ago

Per our discussion today, I will add the year 2000 from this dataset (https://www.sciencebase.gov/catalog/item/58cbeef2e4b0849ce97dcd61) to the 1940-1990 decadal datasets from here (https://www.sciencebase.gov/catalog/item/5a5406bee4b01e7be2308855). These data have the same resolution and land cover classes. Still need a better solution for capturing the entire period of record, to be handled later in the project.

cstillwellusgs commented 2 years ago

All of the COMIDs without full land cover data (sum of classes in a given year falls short of 100%) have upstream area outside of CONUS (in the ten cases here, all in Canada). @jds485 how do you think we should handle these sites? I think the easiest solution is to drop these sites since our data are incomplete, but there may be a lot of these types of sites if/when we get to CONUS-wide predictions.