akrherz / iem

Code that makes the Iowa Environmental Mesonet run, or run into the ground.
MIT License
140 stars 62 forks source link

Curate CONUS404 #564

Closed akrherz closed 1 month ago

akrherz commented 1 year ago

Some initial reviews of ERA5Land curated via #321, have gone up in horrible flames with a soil temperature/moisture product that seems to have major issues over Iowa :( :( :( So, I am going to try CONUS404 and prepare to be disappointed.

akrherz commented 1 year ago

So the initial pain is to resolve what the 4 soil levels are in this dataset. I have a helpdesk ticket into NCAR RDA on this topic.

akrherz commented 1 year ago

RDA support fixed the website documentation and pointed me toward the wrfconstants.nc. The soil layers are

akrherz commented 1 year ago

Replicating the ERA5Land variables, we come up with 256GB per year netcdf files. Life choices need to be made here.

akrherz commented 1 year ago

Thinking aloud about why I am even attempting to curate this dataset...

I am at a combined 15 variables + levels, so any removals would only be saving ~12 GB per. I have space for 256 GB per year, which is about ~10 TB total.

  1. I really want something to produce a soil moisture/temperature climatology, so to QC the ISUSM network and to generate plots to answer researcher/public questions. Hourly data for the deep layers makes little sense, but likely would be of value for shallow depths (+8).
  2. Evaporation is on the present list, but I likely have no immediate usage for it.
  3. Wind data as well, I don't really have an immediate need for this.
  4. Precipitation data likely has downstream use and should stay (+1).
  5. Solar radiation should be useful (+1)
  6. Temperature and Dew Point are likely needed, but again could perhaps just be on a daily time step.

So by trimming, I would save about 80 GB per year. Is that enough to fuss over? Erm. So I am likely to:

akrherz commented 11 months ago

The processing script is currently up to 1998. I should run a quick diagnostic to ensure that I am not getting my hopes up for nuttin!

akrherz commented 11 months ago

We need to spitball what to compute for a soil moisture / temperature climatology, so to be able to place a current value into some historical context.

The first, and most eminent issue, is that we need to put context on the reported soil moisture value so that we are comparing apples and apples. For example, a given sensor may have observed a moisture range (wilting and saturation) between 0.09 and 0.40 and the model has a range between 0.19 and 0.44. So the two values are not directly comparable.

Having percentiles would be very nice, but the question is storage/speed. If we did this at a weekly time step, with ~40 percentile thresholds on the native grid, that's 12 billion data points. I need to look at some actual maps of data and see what type of variability is found in this model...

akrherz commented 11 months ago

Up to 2012 now, but an initial plot of soil moisture climatology was a debbie downer.

test

akrherz commented 11 months ago

Dataset fully downloaded now (thru 2021). Still not much optimism on the soil moisture product.