azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Improve Parquet results from ESIP #94

Closed lewfish closed 1 year ago

lewfish commented 2 years ago

As part of this we should re-run the benchmarks in https://github.com/azavea/noaa-hydro-data/blob/master/src/esip-2022-presentation/benchmark_queries.ipynb with the wide format dataset Vijay created at s3://azavea-noaa-hydro-data/esip-experiments/datasets/reanalysis-chrtout/parquet/vl/wide-parquets-all-feature_ids/

We should also try other ways of formatting it if we can't get the numbers close to the Zarr ones.

vlulla commented 2 years ago

I ran the benchmark, using a separate notebook (s3://noaa-notebooks/vlulla/benchmark-using-wide-parquet.ipynb), for the wide parquet and found that using a wide parquet did not really improve the numbers. The input/output for/from the above notebook are here:

rajadain commented 1 year ago

Not pursuing this at this time.