azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Add notebooks to benchmark Zarr vs. Parquet #2

Closed lewfish closed 2 years ago

lewfish commented 2 years ago

This PR adds notebooks to benchmark Zarr vs. Parquet. See README.md for details.

At this point, reading from Zarr takes 2x as long as Parquet. More effort into optimization is needed. Also, I was only able to save a very small sample of NWM (about 10MB vs 4TB for the original dataset) without my notebook crashing. Even saving 100MB crashes with no useful error messages. Using a bigger sample that is closer to the original size of the dataset seems like a better way of evaluating.