FHanbali commented 1 year ago

Product Source (URL Preferred)

6GB+ hard drive to mail soon.

Product Description

https://hydrology.nws.noaa.gov/aorc-historic/Documents/AORC-Version1.1-SourcesMethodsandVerifications.pdf

"The Analysis of Record for Calibration (AORC) is a gridded record of near-surface weather conditions covering the continental United States and Alaska and their hydrologically contributing areas. It is defined on a latitude/longitude spatial grid with a mesh length of ~800 m (30 arc seconds), and a temporal resolution of one hour. Elements include hourly total precipitation, temperature, specific humidity, terrain-level pressure, downward longwave and shortwave radiation, and west-east and south-north wind components. It spans the period from 1979 at Continental U.S. (CONUS) locations / 1981 in Alaska, to the near-present (at all locations). This suite of eight variables is sufficient to drive most land-surface and hydrologic models and is used to force the calibration run of the National Water Model (NWM). "

Product Format

[ ] Geotiff
[ ] Grib2
[X] NetCDF
[ ] Other (Please provide description below)

Format Description

Follows Climate and Forecast (CF) metadata conventions https://cfconventions.org/

adamscarberry commented 7 months ago

Email from Fauwaz on 4/9/2024 with link to hec drive with 1979 - 2021 data

jbkolze commented 6 months ago

@Enovotny The AORC data is split up by RFC. Do we want to merge these products into a single CONUS product?

Enovotny commented 6 months ago

I am not sure. Is that just what Fauwaz has? I see that they have it available in S3. https://aws.amazon.com/marketplace/pp/prodview-m2sp7gsk5ts6s#resources and examples to work with it https://nbviewer.org/github/NOAA-OWP/AORC-jupyter-notebooks/blob/master/jupyter_notebooks/AORC_Zarr_notebook.ipynb might be worth exploring that more.

jbkolze commented 6 months ago

I was planning to pull from here: https://hydrology.nws.noaa.gov/pub/AORC/V1.1/

That source more closely resembles our sources for most of the other products (as it's just zipped NetCDF files), and it appears to update more frequently -- the S3 bucket's latest data is from 2023 (and it looks like it might only update at the end of each year). That being said, if we have a reason to prefer the S3 source I can try to figure it out from that angle.

I haven't seen Fauwaz's data so not sure on its format -- my assumption was that it was in the zipped NetCDF format but I could be wrong. Was planning to set up the airflow / cumulus routines for present / future data and then load in Fauwaz's archived data once the processing is set up.

Enovotny commented 6 months ago

hmmm. I am not sure the workflow for this one. you might have to play around with it. I am guessing that airflow would grab the zip file and then grab the individual file inside and ship those to cumulus.. You might have to have airflow run every day, but just check if a file with a new timestamp is present because it looks like multiple months of data were loaded. I think I did something similar with one of the APRFC datasets. Play around with it and let me know if you have questions. It might be hard to merge them together since when the geoprocessor runs to merge and then save the .tiff, it would have to have all of the files present for that time stamp for all RFC, which might not be the case. if just one is missing it would fail.

FHanbali commented 5 months ago

Hey all, what I have is a archive of the 800m CONUS AORC netCDF files (CSU source), and I sent you the link for those.

The AWS source is very interesting indeed, especially that it includes Alaska and more variables of interest than just precip. Would be good to cross check these two sources.

As for the https://hydrology.nws.noaa.gov/pub/AORC/V1.1/ source, it's broken up by RFC and has 4km resolution. So other two sources would be better please.

jbkolze commented 5 months ago

Whoops. This shouldn't be closed... May have to look into the nuances of our issue branching / merging.

jbkolze commented 5 months ago

@Enovotny The AORC CSU archive data uses a feature that based on my brief searching doesn't appear to be have been encountered in another Cumulus-processed product: scaling. The grids are saved as integer data with an accompanying scale factor of 0.1 to convert the values to the real mm (float) values. This works well for the COG files, but the DSS packager (and, I assume, DSS itself) doesn't support scaling. I tried two approaches to resolve this: unscale the grids upon initial processing and convert to float32, or keep the scaling and manually unscale the grids as part of the dss7 packaging routine.

Initial unscale:

Cumulus grid processing jumps up to average of 1.87s/grid
Stored hourly grids are ~4x larger
DSS processing for ~5 days: 0.1845 seconds
Precision is float32 mm

DSS Unscale:

Cumulus grid processing 0.96s/grid
Stored hourly grids ~1MB
DSS processing for ~5 days: 0.1358 seconds
Precision is 0.1mm (int * 0.1)

DSS unscaling can probably be optimized some by moving a few things around (e.g. scaling before the gdal.Warp() call rather than after), but that was my quick attempt to get it to work. Grids are negligibly different numerically in my opinion, though the DSS-unscaled grids do notably drop values < 0.05 and appear "blockier" due to the precision limit of 0.1mm.

The best of both worlds would seemingly be for DSS to support scaling, but I don't know if that's feasible.

Linked branches are currently up-to-date with the "Initial Unscaling" methodology.

Next step is to explore the AWS AORC repository, which could potentially make all of this moot if it ends up being a better source for this data. Will likely be a good idea to set up a separate issue/branches for that effort.

adamscarberry commented 5 months ago

I tried two approaches to resolve this: unscale the grids upon initial processing and convert to float32, or keep the scaling and manually unscale the grids as part of the dss7 packaging routine.

Keep in mind that if a user chooses the download option of geotifs (in a tar.gz) or dss, the data in the results should be the same. Avoid putting special code in the packager's dss writer for certain products.

jbkolze commented 5 months ago

I tried two approaches to resolve this: unscale the grids upon initial processing and convert to float32, or keep the scaling and manually unscale the grids as part of the dss7 packaging routine.

Keep in mind that if a user chooses the download option of geotifs (in a tar.gz) or dss, the data in the results should be the same. Avoid putting special code in the packager's dss writer for certain products.

They are the same. A COG supports scaling inherently, and DSS seemingly doesn't, so you have to do it manually if the COG is scaled. They'll be different if you don't update the DSS writer.

FHanbali commented 2 months ago

242

Original issue above may have dealt with how to treat the AORC data.

USACE / cumulus

[PRODUCT]:Analysis of Record for Calibration (AORC) #377

Product Source (URL Preferred)

Product Description

Product Format

Format Description

242