Closed lewfish closed 1 year ago
I ran the benchmark, using a separate notebook (s3://noaa-notebooks/vlulla/benchmark-using-wide-parquet.ipynb
), for the wide parquet and found that using a wide parquet did not really improve the numbers. The input/output for/from the above notebook are here:
s3://azavea-noaa-hydro-data/esip-experiments/datasets/reanalysis-chrtout/parquet/vl/wide-parquets-all-feature_ids/streamflow-1990-1999-consolidated-wide.parquet
s3://azavea-noaa-hydro-data/esip-experiments/benchmarks/vl/08-22-2022-with-wide-parquet.csv
s3://azavea-noaa-hydro-data/esip-experiments/plots/parquet/vl/08-22-2022-wide-parquets/parquet.png
Not pursuing this at this time.
As part of this we should re-run the benchmarks in https://github.com/azavea/noaa-hydro-data/blob/master/src/esip-2022-presentation/benchmark_queries.ipynb with the wide format dataset Vijay created at
s3://azavea-noaa-hydro-data/esip-experiments/datasets/reanalysis-chrtout/parquet/vl/wide-parquets-all-feature_ids/
We should also try other ways of formatting it if we can't get the numbers close to the Zarr ones.