azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Append NWM Forecast subset to Parquet and Zarr #12

Closed echeipesh closed 2 years ago

echeipesh commented 2 years ago

streamflow

Wanted script to extract one or more HUCs from NWM short-range prediction NetCDF format and append to existing .parquet dataset.

I think the hard part of this issue is to figure out if/how it's possible to append to Parquet file from python and what the schema for the streams file thats friendly to appending should be.

Assumptions:

Questions:

Notes: It is not clear its easy to append to Parquet files. Lots of SO examples talk about re-reading and re-writing the file. That's not an option because we expect to read It appears possible based on this Java implementation: https://github.com/apache/parquet-mr/pull/278 It does not appear possible to do this using PyArrow

lewfish commented 2 years ago

We can append to Parquet and Zarr. See linked PRs.