azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Task 1-2: Benchmark HUC Query of NWM Gridded Zarr Output #27

Open lewfish opened 2 years ago

lewfish commented 2 years ago

While the most likely query pattern for gridded output data will be by HUCs, the performance of the Zarr library relative to the size of the query needs to be benchmarked. To that end, we will perform this benchmarking for several HUC 8, 10, and 12 regions. These will also be recorded as a sample documentation notebook in the project repository.

lewfish commented 2 years ago

For sample benchmark #1, it's not clear if we should be running these aggregations for each stream individually, or across all streams. It's also not clear if "daily averages" should be averaged across all days in the dataset, or we should be computing an average for each individual day. I would also like to know typical values for the number of HUC8s, and the length of date/time range.