azavea / noaa-hydro-data

NOAA Phase 2 Hydrological Data Processing
11 stars 3 forks source link

Develop benchmark queries #53

Closed lewfish closed 2 years ago

lewfish commented 2 years ago

The following is information sent to us by Fernando about typical queries that NOAA runs on NWM. We would like to develop a set of benchmark queries based on this information.

Overall, the team does three combinations of queries that are probably very familiar to your team that deals with spatial-temporal data:

Obviously, most queries to NWM data combine these three categories in unique ways.

More concretely, here are some typical queries:

lewfish commented 2 years ago

We will initially focus on the following query using the reanalysis dataset, since it already all stored in a cloud-friendly format and we can just focus on re-formatting it different ways.

"All stream flows within a given set of HUC8's and a date/time range. Samples are then aggregated to daily averages/max/min."

It's not clear if we should be running these aggregations for each stream individually, or across all streams. It's also not clear if "daily averages" should be averaged across all days in the dataset, or we should be computing an average for each individual day. I would also like to know typical values for the number of HUC8s, and the length of date/time range.

lewfish commented 2 years ago

Some clarification from Fernando:

"In terms of date/time ranges, I can't give you a specific one. Someone interested in large time-scales may want to do the entire 40 year history. Others may just want a time domain pertaining to a given flood event and those can vary from a few hours to weeks.

Yes, temporal aggregations target different resolutions. A user may want a time domain for a year but then get daily max's or weekly min's (if working with drought). These are common but yes the target resolution could be the same as the time domain (user pulls a year of data and just wants to find the max for the year).

Lastly, I would say the most common aggregations are across a time domain for the given Feature IDs. Occasionally, you will find a need to aggregate for a spatial area across Feature ID's."