This PR adds notebooks to benchmark Zarr vs. Parquet. See README.md for details.
At this point, reading from Zarr takes 2x as long as Parquet. More effort into optimization is needed. Also, I was only able to save a very small sample of NWM (about 10MB vs 4TB for the original dataset) without my notebook crashing. Even saving 100MB crashes with no useful error messages. Using a bigger sample that is closer to the original size of the dataset seems like a better way of evaluating.
This PR adds notebooks to benchmark Zarr vs. Parquet. See README.md for details.
At this point, reading from Zarr takes 2x as long as Parquet. More effort into optimization is needed. Also, I was only able to save a very small sample of NWM (about 10MB vs 4TB for the original dataset) without my notebook crashing. Even saving 100MB crashes with no useful error messages. Using a bigger sample that is closer to the original size of the dataset seems like a better way of evaluating.