azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Add notebook recording NetCDF to Zarr conversion #105

Closed rajadain closed 2 years ago

rajadain commented 2 years ago

Overview

Adds a Notebook that pulls down NWM Predictions Short Term Channel Routing data and converts it to Zarr.

This is a simple conversion of one snapshot. The next step is to do this for multiple snapshots and append the data to the same Zarr file.

Checklist

Testing Instructions

Closes #102

review-notebook-app[bot] commented 2 years ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

vlulla commented 2 years ago

This looks great! The only thing I would recommend is adding an assert at the end which ensures that both the datasets are indeed same. Maybe this?

dsz = xr.open_dataset(f'{PREDICTIONS_DATADIR}-channel_rt.zarr')
assert all(np.allclose(ds[v].to_numpy(), dsz[v].to_numpy(), equal_nan=True) 
           for v in ds.data_vars.keys() if len(ds[v].shape)>0)
rajadain commented 2 years ago

Great suggestion! Added in 06f8951

vlulla commented 2 years ago

Excellent!

On a separate note, I was wondering if we ought to have this whole thing as a function that gets the data for a particular day? Maybe something like get_geo_data from this example notebook? If so, I propose:

def get_short_range_forecast_data(date: str) -> xr.Dataset:
  ....
  return ds

## Called like
ds = get_short_range_forecast((datetime.datetime.now() - datetime.timedelta(1)).strftime('%Y%m%d'))

I have been unable to determine if the predictions data are available on s3...so we might be unable to use str glob for xr.open_mfdataset. Regardless, I believe that encapsulating your notebook steps into a function might be helpful. Whatever you decide is perfectly fine with me!

rajadain commented 2 years ago

The data is available on S3 as described here: https://docs.opendata.aws/noaa-nwm-pds/readme.html

aws s3 ls noaa-nwm-pds/nwm.20221019/
                           PRE analysis_assim/
                           PRE analysis_assim_extend/
                           PRE analysis_assim_extend_no_da/
                           PRE analysis_assim_hawaii/
                           PRE analysis_assim_hawaii_no_da/
                           PRE analysis_assim_long/
                           PRE analysis_assim_long_no_da/
                           PRE analysis_assim_no_da/
                           PRE analysis_assim_puertorico/
                           PRE analysis_assim_puertorico_no_da/
                           PRE forcing_analysis_assim/
                           PRE forcing_analysis_assim_extend/
                           PRE forcing_analysis_assim_hawaii/
                           PRE forcing_analysis_assim_puertorico/
                           PRE forcing_medium_range/
                           PRE forcing_short_range/
                           PRE forcing_short_range_hawaii/
                           PRE forcing_short_range_puertorico/
                           PRE long_range_mem1/
                           PRE long_range_mem2/
                           PRE long_range_mem3/
                           PRE long_range_mem4/
                           PRE medium_range_mem1/
                           PRE medium_range_mem2/
                           PRE medium_range_mem3/
                           PRE medium_range_mem4/
                           PRE medium_range_mem5/
                           PRE medium_range_mem6/
                           PRE medium_range_mem7/
                           PRE medium_range_no_da/
                           PRE short_range/
                           PRE short_range_hawaii/
                           PRE short_range_hawaii_no_da/
                           PRE short_range_puertorico/
                           PRE short_range_puertorico_no_da/
                           PRE usgs_timeslices/
rajadain commented 2 years ago

Good idea on making it a function. I'll also try to read the source from S3 directly, as talked about here: https://gis.stackexchange.com/questions/429000/error-trying-to-open-netcdf-file-with-xarray-from-s3-bucket

vlulla commented 2 years ago

https://github.com/awslabs/open-data-docs/tree/main/docs seems like a good page to bookmark!

rajadain commented 2 years ago

Added examples for how to read directly from S3 in ff5ba82.

rajadain commented 2 years ago

Added timings for reading from S3 and writing to S3 and comparisons of reading NetCDF from S3 vs reading Zarr in 2ade85f.

rajadain commented 2 years ago

Thanks for reviewing!