NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I want the WRES to fail if the data on each side if the pairing is verifiably registered to a different datum #41

Open HankHerr-NOAA opened 1 month ago

HankHerr-NOAA commented 1 month ago

See WRES User Support ticket #132398. In general, if the WRES is able to confirm different datums for the sides of the evaluation, the WRES should except out. More specifically, if the USGS NWIS stages are registered to a different datum than the WRDS RFC forecast stages, then an exception should be generated. So this ticket will require implementing code to look for a datum and then except out if the datum is different. WRDS datums are available via the location service (I've asked the WRDS team to embed the information in the RFC forecast service response). Its unclear where USGS NWIS datums are defined. Its also unclear if datums are available for other sources, such as PI-timseries XML.

A declaration to reproduce the issue for BGDA2 is in Vlab Redmine #132398-18.

Hank

james-d-brown commented 1 month ago

I think the ticket title is correct, but the implementation will need to be conditional upon evaluating a variable that is actually registered to the datum. For example, if WRDS returns a streamflow response (not registered to a datum) with the datum included, an unconditional check is not going to work and this becomes more painful to achieve, i.e., we need to check conditionally on variable name or something (yuck!).

HankHerr-NOAA commented 1 month ago

Would it be accurate to say, if WRDS returns streamflow data with a datum, then that is a bug in WRDS? From your comment, I would guess perhaps yes, but they might just be returning whatever their source says to return, in which case the bug would be upstream of WRDS.

Thanks for that information,

Hank

james-d-brown commented 1 month ago

I wouldn't go as far as to say a bug if the WRDS is merely translating generic location metadata that applies to all variables, even if only a subset of the metadata is relevant to some of the variables. It is arguable. But, certainly, it would be preferable if they only returned a datum when a datum was actually in use.

james-d-brown commented 4 weeks ago

Should probably make some progress on this one and on #36, as they are important aspects of usability for evaluating height or stage measurements against a vertical datum and these limitations were reported by an RFC user.

Progress on this ticket may not mean an initial "fix" for some time, as it depends on information being available to WRES, which is not fully within our control (USGS NWIS, WRDS), as well as a placeholder inside WRES for that information (notably, in the canonical/protobuf format and in the database schema), which is fully within our control. Once we have the information coming in and a place to store it, we can use it for validation and for triggering any associated errors, as requested in the OP.

james-d-brown commented 4 weeks ago

Georeferencing is a complicated and confusing topic.

As I understand it, a coordinate reference system could be geographic, with three-dimensional position measured relative to an ellipsoid (aka spheroid, i.e., an ideal surface) or relative to a geoid (a more complex gravity surface, essentially corresponding to mean sea level), or it could be projected/two-dimensional with a separate vertical coordinate reference system. In other words, the horizontal and vertical coordinates may originate from the same (3D) model or a different underlying model. Yuck.

To illustrate the complexity...

Typically, a coordinate reference system is referred to with a well known identifier or spatial reference identifier (SRID), which is typically the ESPG identifier (European Petroleum Survey Group):

https://epsg.io/

For example, the WGS84 spheroid is EPSG:4326. Most of the SRIDs we see for WRES are 4326, as you will note from a snapshot of our database at any given moment; if the SRID exists, it is probably 4326, right now. Largely because NWIS uses this.

However, in the US, horizontal and vertical coordinates are often based on NAD83 (ESPG:4269) and NAVD88 (ESPG:5703), respectively, which use the same spheroid model. Both of these are getting replaced soon, I believe. Anyway, there are two separate identifiers, one for the horizontal component and one for the vertical component, even though they use the same ellipsoid.

At the same time, there is also an ESPG for the combination of these two things:

https://epsg.io/5498

So, is a single SRID adequate or not? Maybe, maybe not. Sometimes, there is an ESPG that combines the two, other times not. But the upshot is, I think, that we need to support a vertical reference system too.

This makes life quite difficult, because there may be one or two coordinate reference systems to describe the 3D position and equivalent information could be separated across the two, as illustrated above: ESPG:4269 + ESPG:5703 = ESPG:5498.

That is going to make it rather hard to validate a different vertical coordinate system between two datasets. I suppose the most we can do is to validate against two explicitly different vertical coordinate IDs and otherwise warn when some or all of the information is missing, but that would not exactly guarantee that all problems are caught because a SRID could encapsulate the vertical component.

The other problem is knowing when the vertical coordinate system matters. If we are evaluating streamflow, it really doesn't matter because stage/elevation has been eliminated via the rating curve. But the software is completely agnostic about the variables it is evaluating and makes no attempt to understand them in some deeper sense, notably whether they are elevations.

We can probably do some conditional warnings/validation, such as understanding that a reference to variable "00065" is an elevation in the context of NWIS, but this will be impossible to do in any general way and, even for NWIS, there are several parameter codes that measure height or elevation.

Ugh.

james-d-brown commented 4 weeks ago

In short, I think this ticket is far, far harder than #36. To achieve a validation error as described in the OP, we would need all three of these things to be met:

  1. The ability to register a vertical coordinate system ID within the WRES (canonical format and database schema) for each geographic feature encountered;
  2. For each data source (observed, predicted and baseline) to clarify the vertical coordinate system in use with an ESPG number, else to translate from some descriptive identifier such as "NAVD 88" to an ESPG; and
  3. For the WRES to understand when a variable being evaluated represents an elevation or height for which the vertical coordinate system matters.

The first condition is merely work, but the second and third conditions are either hard or not fully within our control.

Conversely, it is pretty straightforward to allow a user to declare a datum offset to be subtracted from one or all sides of data, because this does not require any technical information about the coordinate reference systems and the user is doing all of the hard work for us (deciding when the offset is needed and identifying what it is) - that is #36.

There is a separate conversation about doing this automatically, though, because that also has the complexity of this ticket. For example, if USGS and WRDS both provided a datum offset from zero in the same vertical CRS, it would still require WRES to understand that the vertical CRS was the same and that the offset needs to be applied because the variable is an elevation. We don't currently have a ticket for that work, but it will be roughly as difficult as this ticket.