NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I need to be able to potentially select from multiple USGS gauges to pair with a NWM forecast output segment. #191

Open epag opened 1 month ago

epag commented 1 month ago

Author Name: Hank (Hank) Original Redmine Issue: 54450, https://vlab.noaa.gov/redmine/issues/54450 Original Date: 2018-08-28


Snippet of an email from Ross pertaining to planned upgrades to his location table:

There remain some significant issues in this, especially with respect to the national water model. One is, the national water model outputs data at an NHD+ segment. What do you do when there are two or more USGS gages in that roughly same location? For example, above and below a dam? The database I was working with did not allow for multiple pairings with the same model output.

In addition, my recollection is there are something like 200 locations where the WFO has a different USGS gage paired with the same lid used by the RFC for another USGS gage, I believe a bunch of those are above and below dams.

Thus, in some cases, there may be multiple gauges that could be paired to an NWM location; we would need to provide a default selection routine and likely the ability for a user to override that default selection.

If anyone has any questions for Ross, let me know, and I'll add him as a member to the WRES project so he can interact with this ticket.

Adding watchers and putting in the backlog,

Hank


Redmine related issue(s): 39721, 72747


epag commented 1 month ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2018-08-28T12:21:44Z


Would a cross join of the locations be helpful? For instance, NHD+ gives the location with the dam in the middle of it (stop me if that's not a probable scenario, it's kinda early) and USGS has 4 locations for that section, 3 above the dam, 1 below. Would we want a measure of which section is a closer match? If we specify a station and there are 10 gages associated, would we want to determine which have the highest correlation?

Is this something we should try to collaborate with GID on? I believe they are doing a lot of the same work.

Regardless, we need to add a LOT more locations to the database. I have far more locations that need to be added to our conus records that have NHD+ <-> USGS pairings, but not tied to stations. We also don't have records added for Hawaii. I think I've isolated close to 48 and Brad has his own list, while Alex has a whole other list with more stations. Don't we also need to add locations for Puerto Rico and Alaska?

epag commented 1 month ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2018-08-28T12:28:02Z


Chris,

This could involve collaboration with GID; specifically, I think this should be handled by the WRDS folks. I've included discussion of this as part of next week's WRES/WRDS meeting. Specifically, Alex is extending his list of locations, you are extending the WRES default list, Ross is apparently working on his list, other teams have their own lists. This is screaming for a single, central solution. I just have no idea how soon they'll be able to provide something.

Yes, there are plans to add locations outside of CONUS (AK, HI, and Puerto Rico, I believe).

Hank

epag commented 1 month ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2018-08-28T12:30:10Z


By the way, that comment in particular was referring to the location table.

I think the problem of deciding which gauge to use in pairing falls to WRES to resolve, since the WRDS folks told us long ago that they would not be providing pairing capabilities, just the data (James: I think this is what they said during a meeting at some point a long time ago... let me know if I'm wrong).

Hank

epag commented 1 month ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-08-28T12:30:42Z


Verifying in the presence of river regulations is fundamentally difficult. I don't really see this being tackled in an automated way in most cases. In other words, I would expect a user to manually prescribe which observed variables get paired with forecasts/simulations, and those observations may have undergone some pre-processing/transformation work first. For example, out West, they will often want to verify against natural flows.

I can imagine a broad set of locations getting verified automatically, but I cannot see much attention being paid to automated results (based on matching of location identifiers) from heavily regulated locations (e.g. immediately downstream of dams), because the raw observations will often need some pre-processing first.

In short, I just think we need to be able to provide a user with the option to manually prescribe how forecasts and observations are paired for specific locations. For example, it may be useful to have a subset where locations are matched to USGS gauges and another subset where the matching observations are prescribed manually (not taken from USGS). Probably in two separate project declarations though (with different source declarations).

Verifying regulated locations is tough and often requires a lot of thought/manual work.

epag commented 1 month ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-08-28T12:32:08Z


Hank wrote:

since the WRDS folks told us long ago that they would not be providing pairing capabilities, just the data (James: I think this is what they said during a meeting at some point a long time ago... let me know if I'm wrong).

Right, at least not as an early priority (who knows in the long-term, because model calibration and other activities rely on pairing too).

epag commented 1 month ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2018-08-28T12:35:53Z


James, Chris:

Would identifier mapping, which I think we discussed about a year ago, provide a possible solution? If so, maybe I can dig around for that old ticket and relate it.

Hank

epag commented 1 month ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-08-28T12:39:04Z


Hank wrote:

James, Chris:

Would identifier mapping, which I think we discussed about a year ago, provide a possible solution? If so, maybe I can dig around for that old ticket and relate it.

Hank

Yes, I think that's the ballpark of the solution, i.e. the flexibility we allow surrounding identifier mapping. However, we may not need to do anything in the first instance. If this problem is mainly about regulated locations and users are mainly doing manual work to prepare observations for those locations, they will have some inherent flexibility, i.e. there's no direct data access from a web-service on which to match location identifiers with forecast sources.

In short, I'd probably give this a wide berth, other than your point about coordination, until we know what the precise requirement might be.

epag commented 1 month ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2018-08-28T12:41:53Z


I like the idea of having user specified locations; I believe that's how the original project config worked way, way back when, prior to wres-config. The most difficult aspect of that, however, would be large scale evaluation (which is why it was removed, I believe). We have several tests that evaluate >6,000 different locations and several that evaluate >350. That's an extremely useful capability. It would be completely unusable if we forced users to specify the matching for each location. We definitely don't want to throw the baby out with the bathwater.

Would we want to set up some form of fallback mapping, but offer the ability for users to override?

On a separate note, do we want to add another column to our output to show left location vs right location?

I also agree with James' point; this is tight rope walking at this point; more raw information is needed.

epag commented 1 month ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2018-08-28T12:47:29Z


Related.

Agreed on needing more information. I'll leave it in the backlog with normal priority, for now, and see where discussions head with WRDS and Ross (who is also invited to the meeting).

Thanks!

Hank

epag commented 1 month ago

Original Redmine Comment Author Name: James (James) Original Date: 2018-08-28T12:49:45Z


Christopher.Tubbs wrote:

I like the idea of having user specified locations; I believe that's how the original project config worked way, way back when, prior to wres-config. The most difficult aspect of that, however, would be large scale evaluation (which is why it was removed, I believe). We have several tests that evaluate >6,000 different locations and several that evaluate >350. That's an extremely useful capability. It would be completely unusable if we forced users to specify the matching for each location. We definitely don't want to throw the baby out with the bathwater.

My, guess, and I'm only guessing at this point, is that they'd want to decompose this problem into two location sets, one containing the bulk of locations (perhaps all), which uses our current approach. Then a separate set, much smaller, that has some kind of user intervention in the location mapping process. The latter would be part of a separate configuration, so they don't get mixed up. But, like I say, I don't really know at this point.

Christopher.Tubbs wrote:

Would we want to set up some form of fallback mapping, but offer the ability for users to override?

Personally, I would prefer to have them configure anything substantially different in separate projects (as above). We may need to allow some manual intervention in location mapping for locations that require some rule for mapping. But, in general, we don't want manual intervention, and we don't want to throw out what we have.

Christopher.Tubbs wrote:

On a separate note, do we want to add another column to our output to show left location vs right location?

Right, I think we want to propagate that information if and when it matters (to whatever outputs it matters for).

Christopher.Tubbs wrote:

I also agree with James' point; this is tight rope walking at this point; more raw information is needed.

Yeah, I'd put this one on hold for now, pending further clarity.

epag commented 1 month ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-09-26T16:14:03Z


The location service should respond with the list of locations that potentially match. It sounds like 99 times in 100 there will be one USGS gauge location returned when you ask for an LID location and we're talking about the 100th time. I think the service should simply return both. Then it is up to WRES how to resolve the ambiguity. How we resolve that, I don't know.

Option 1. Omit the LID from evaluation because it's ambiguous Option 2. Include both locations because they're both valid Option 3. Use "nearest" by calculating the distance between the LID and the gauge locations (when actual coordinates on an identified ellipsoid is provided for all of the LID and the gauges in question) Option 4. Guess which is "downstream"

Another approach is to use a step-by-step combination of the above, such as "try option 4, if that fails, go to option 3, if that fails, go to option 1."

All of this is pretty complicated, so I like "drop", option 1, if it occurs.

epag commented 1 month ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2018-09-26T16:16:17Z


I should add that we should warn the user as well. Drop-and-notify.

Perhaps we need an explicit interface for caveats. Or caveats should be baked into the metadata somehow.

epag commented 1 month ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2020-06-03T22:43:02Z


Bump, related to location/feature aliasing/specification/modeling.

epag commented 1 month ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2020-08-03T21:13:36Z


Found while looking for Puerto Rico, but this relates to what James found earlier today in the geographic feature stuff for 5.0.

epag commented 1 month ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2020-10-26T17:03:59Z


Not fully resolved in 5.0 but major headway in #72747.

Need to revisit the @FeatureFinder@ to allow for it (and nerf some capabilities of automatically finding feature correlations in @FeatureFinder@ that were based on the 1-1-1-relationship-within-an-evaluation assumption).

epag commented 1 month ago

Original Redmine Comment Author Name: James (James) Original Date: 2022-12-21T13:05:36Z


As of 6.9, I believe this should be solved.

Feature correlations can be entered manually or retrieved via the WRDS feature service.

Feature correlations can be many-to-one as of 6.9.

What more is left?

Perhaps overriding correlations that are supplied by a feature service through manual declaration? However, it's unclear from the OP whether that is requested.

When using WRDS to resolve feature correlations, the likely worst case scenario is that you get more statistics than you'd hoped for because it includes the superset of feature correlations. edit: When declaring correlations manually, you can ask for whatever you want, including many-to-one relationships as of 6.9.