NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I want faster retrieval from nwis for short-in-time evaluations #72

Open epag opened 2 months ago

epag commented 2 months ago

Author Name: James (James) Original Redmine Issue: 101014, https://vlab.noaa.gov/redmine/issues/101014 Original Date: 2022-02-02


Given a short-in-time evaluation that acquires data from nwis When the data is chunked in time for requests Then the chunks should achieve a better balance of retrieval time versus data re-use


Redmine related issue(s): 108361


epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-02-02T13:50:50Z


This is a particular problem for large in space evaluations and, while waiting a very long time for data to be acquired from nwis in #100844 for an evaluation that spans 10 days, I noticed messages like this:

2022-02-02T13:12:53.362+0000 [WebSource Ingest -> #82] WARN wres.io.reading.waterml.WaterMLSource - Skipping site 01052500 because multiple timeseries for variable 00060 from USGS NWIS URI https://nwis.waterservices.usgs.gov/nwis/iv?endDT=2022-01-01T00%3A00%3A00Z&format=json&parameterCd=00060&sites=01052500&startDT=2021-01-01T00%3A00%3A01Z

I want to revisit that chunking.

One alternative might be to use year ranges for evaluations that span one year or more, else the exact range required or some smaller, fixed, period to promote re-use (e.g., 3 months for evaluations > 6 months, else 1 month).

Obviously, this is a trade-off and adds complexity and there's nothing inherently wrong with favoring re-use for long-in-time evaluations, but it's a little too painful for short-in-time, large-in-space evaluations.

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-02-02T13:55:35Z


I see a related commit:wres|cd1a3db7b6f9d971d1d3b7ce5205cef57ed307f2, which references #80554 and #86887.

I also see the main event in commit:wres|811476a207f808831555ff7ce6859529d5c428a5, which references #80554.

I guess I will read #80554 as time allows, but the purpose of this ticket is to revisit that explicit trade-off because I think it went too far.

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-02-02T13:57:38Z


In other words, the goal would be to reframe the trade-off and not destroy performance for multi-year evaluations, rather to do better for shorter evaluations. This will add some code complexity and may reduce data re-use overall, but it's necessary.