NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As an RFC user, I want to view WRES outputs in FEWS somehow #203

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: Hank (Hank) Original Redmine Issue: 61657, https://vlab.noaa.gov/redmine/issues/61657 Original Date: 2019-03-22


FEWS generally wants to see time series data. That means assigning data a T0 and having valid times (or lead times) for recorded numbers.

So, to view outputs directly in FEWS through FEWS capabilities (time series plots and spatial displays), the WRES output would need to be masked as a time series. I'm going to ignore metrics computed against rolling windows, which can be readily turned into time series; that's not the problem. For single-number metrics either computed against lead time or for pooled lead times, it is possible to mask it as a time series by assigning them a logical T0 and using the lead times to dictate the valid times. However, that won't work for other metrics, such as diagrams, and frankly is sort of silly.

What is the best solution for displaying WRES output in FEWS? Specifically, let's focus on the needs for the RFCs, since they'll be harder to meet, and assume we have LDAD/LDM access to the WRES HTTP API. Any solution that works for the RFCs should work for other users using CHPS.

One possibility is for WRES to provide the images to view, directly, but note that shipping images around can be costly in terms of bandwidth, something that may bother the RFCs. (FYI, it appears shipping around the numbers won't be too costly based on an experiment and conversation I had with the NOAT.)

Another solution would be something like what NWRFC did: use a tool external to FEWS to create static webpages and then view those webpages in FEWS.

Another solution would be the masking I've mentioned above, which would allow for displaying in FEWS using built-in capabilities. That would only work for some metrics, but would have the advantage of allowing for CHPS to display the outputs spatially as well as in time series plots.

Another solution is to create a plug-in to FEWS to support visualization. In fact, that plug-in may even be able to call WRES, itself (eventually... not soon).

Other solutions?

I'm looking for a short-term option to support Juzer so that he can be weaned off of Alex's scripts relatively soon (that may require masking given how quickly it would need to happen). I'm also looking for a preferred approach. I may later present this to the RFCs and WPOD as a solution option to support evaluations during operations.

Watchers added, including. Let's see how long this takes to spin out of control,

Hank


Redmine related issue(s): 61557, 86102


epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-03-25T14:36:53Z


In #61557, James said:

I'm making two separate points here (or asking two different questions):

The ESRI Shapefile format is not a good format for us to target, so I want to understand why it is being targeted. What, specifically, about this format makes it easier to import into FEWS vs. other geospatial formats? The statistical outputs from the WRES are not structured as time series. The FEWS data model is inherently built on time series, as far as I know (based on conversations with Deltares on the same topic). How do you intend to reconcile this difference? If I have a verification score for a forecast lead duration of 6 hours, derived from a statistical sample that corresponds to a period of record of 1985-01-01T00:00:00Z through 2015-01-01T00:00:00Z, what is the datetime we intend to attach to that verification score in order for FEWS to understand that as one element of a time-series? There are some statistics that we can massage into a time-series format, but mostly we cannot.

to which I responded,

2 is my bigger concern as it relates to the ROE solutions analysis. The initial, short-term solution will almost certainly be something based in CHPS and hopefully be backed by the WRES web-service (if we can overcome some difficulties). For that to work for all possible metrics, I think we may end up depending on NWRFC's Python display solution. Still, the RFCs are going to want to review some metrics in CHPS, perhaps using the spatial displays Alison demonstrated. How do we get that data into a form that can be displayed in CHPS?

We could mimic how John and Alison pull it off. They use CHPS modules to do it; for example:

https://vlab.noaa.gov/redmine/projects/nwc-nwmoa/repository/revisions/master/entry/cbrfc/nwm_config_sa/Config/ModuleConfigFiles/Verification/NWM_Verifi_MediumDaily_Template.xml

I believe what this is doing is creating time series data from the current CHPS system time (when the module/workflow is executed) and then forcing the lead time metrics into a time series given that T0. Then they run it once a day, obtaining a new set of time series each day. There are various ways we can accomplish the same, but I wonder if we should wait to talk with the RFCs first.

As for Juzer's needs, he already has the ability to structure outputs into NetCDF relying on Alex's old R-scripts. We could ask him to focus on using CSV output from WRES as the basis for that restructuring of the outputs so at least what he develops may be generally useful (a tool to convert from WRES CSV to NetCDF ingestible by FEWS). I think what Alex's stuff does is let WRES handle the pairing while computing the metrics itself and those metrics are turned into NetCDF CHPS can ingest.

Thoughts?

If you have a reply to my comment, please make it here.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-03-25T14:40:35Z


Description edited to mention the needs for spatial displays in CHPS.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-03-25T15:21:15Z


Hank wrote:

If you have a reply to my comment, please make it here.

Hank

My thought is: yes, we can probably dress an apple to make it look more like a banana.

Still, if I want a banana, I might not be too happy when I discover that it's an apple wearing a mustache.

FWIW, I am independently working with Deltares on this because they want to put the EVS outputs into their OpenArchive format and the same problem arises in that context: how do you make statistical outputs that are not time-series compatible with a data model that is built on time-series?

I see two possible routes. The first is to make the FEWS data model more flexible, so that it can ingest things that are not time-series. That seems preferred to me, but also pretty unlikely.

The second is to choose a best approach to forming time-series from things that are not inherently time-series. We're probably in that category, which is what you're alluding to, Hank. All of our statistical outputs are associated with a "time window". We need to take our representation of a "time window" and map that into a FEWS understanding of time, starting with the verification scores.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2019-03-25T15:29:13Z


bq. All of our statistical outputs are associated with a "time window". We need to take our representation of a "time window" and map that into a FEWS understanding of time, starting with the verification scores.

As long as no one has changed it, this is how the netcdf files are currently set up.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-03-25T15:33:52Z


Chris: Right. The NetCDF includes some kind of artificial T0 to force it to output something that can be considered a time series. However the, format still cannot be handled by FEWS. Why? I'm going to look at a current NetCDF output.

James: Right and agreed on the preference. I'm surprised, frankly, that FEWS is only having to deal with this now. Aren't there other data sets/types that aren't time series but which users would want to see in FEWS somehow? I guess not.

There are certain "obvious" ways we might be able to pull of the masking of metric output as FEWS-ready time series. The most challenging aspect of that for the verification scores is identifying the T0 (reference time, forecast date, whatever) to use in such a way that it would be intuitive to a user. Given that T0, then the metrics can be recorded against the lead time to which they correspond (or end of the lead time pooling window if pooled), or, for single-number metrics, just record the value at T0 itself. However, you need the T0 in order for it to be considered a time series.

For that T0, you could use the last time for which you have data, the end of the issued time pooling window (perhaps we can require that be specified when outputting to the special format we are going to work with), or user entry of the T0 to use as an option on the output format we create.

Again, we can choose to have a tool sitting outside WRES that can convert our CSV to a time series. This might be the preferred approach to avoid polluting the WRES code base (if this would be considered polluting). That converter could be a script or an adapter in FEWS which could just take CSV files to which it is directed and read/convert for ingest to FEWS. It could also be something developed outside of the WRES team, perhaps by a NOAT RFCs or Juzer.

Is there a best approach?

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-03-25T15:37:55Z


Chris wrote:

bq. All of our statistical outputs are associated with a "time window". We need to take our representation of a "time window" and map that into a FEWS understanding of time, starting with the verification scores.

As long as no one has changed it, this is how the netcdf files are currently set up.

Indeed. I think the harder part is mapping that into FEWS world.

So I think we may be on the same page now? In other words, I don't think the problem is the format, but getting our data into the FEWS data model. In other words, I don't really care what someone wrote about our netCDF format being "unworkable" with FEWS unless it is true and also purely related to format (and I don't believe it is). There may be some tweaks necessary to our netCDF format to make it CF compliant and legible to FEWS, but I suspect it is more about the ability of FEWS to see our "time windows" and do something with them, which will apply to any format.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-03-27T14:52:03Z


Another route to getting WRES output visible in FEWS could be via GraphGen. If you leave the data as CSVs, I could code a reader in GraphGen to read from those flat files. There would still be the problem of masking evaluation output as time series, since GraphGen also works with time series under the hood, but that masking would then occur outside of WRES and in a tool that is readily compatible with FEWS. As for reliability diagrams, ROC diagram, and and other non-score metrics, something else would need to be built.

An advantage of this approach is that GraphGen comes with tools capable of doing cross-feature aggregation. For example, you could tell GraphGen to combine the outputs for hundreds/thousands of features into a single ensemble and let it compute an ensemble mean, compute other moments, compute the spread (cone of uncertainty), plot the empirical distribution, and so on. Further, if you had both AHPS and NWM outputs to consider, it could plot them both in the same graphic allowing for direct comparison.

This technique could also be readily used to display forecast progression plots by adding a pairs.csv reader that sets up the time series involved based upon the pairs. You would not get the USGS data in between the paired points, but at least it would allow for a time series view of the pairs.

Coding readers would actually not be much effort; perhaps a day or two given the framework GraphGen uses (I just have to code a "plugin"... that really isn't a plug in in the truest meaning of the word). Frankly, it would take more time to finalize the products to make them look purty.

Anyway, just a thought... feel free to skewer,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-03-27T15:51:39Z


Indeed, that might be a route.

I think I see the problem as requesting something that should happen ("view WRES outputs in FEWS somehow"), not how it should happen.

In terms of how, I don't see our solution as being "change our concept of verification data to make it look like something that FEWS can understand".

If we can achieve the desired result in both respects (the data in visible to FEWS and we don't need to artificially make our outputs look like "FEWS time series"), then I am all for using some glue that achieves this result and does not impact WRES. Sure, it makes the overall pipeline slightly more brittle - perhaps - at least, there are more moving parts - but I don't want to shift the brittleness to core WRES by, for example, allowing some artificial FEWS concept of verification outputs as time-series.

As to whether this is really our problem is another question, but I think it is our problem, because no one else is going to solve it, and we do want some users.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2019-03-28T11:03:52Z


James,

I'm investigating the GraphGen route now. Coding a crude, featureless InputSeriesProvider and associated configuration JPanel, as its called in GraphGen, took only about 3 hours. Obviously, I'd want to code in some error-checking logic before delivering, but the point here is just a proof-of-concept anyway. The next problem will be displaying a lead duration axis based on time series data, which the reader turns the CSV into. This is, in fact, a problem of FEWS as well; all plots are assumed to be time series plots.

This code may also make it simple for me to create a PI-timeseries from a CSV converter, which could be another route to support FEWS ingest.

However, since this is a spare time, on-the-side activity, it may be a while before I have something ready to show anyone. Frankly, I'm not even sure this will be useful long-term, but at least it allows me to code.

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2020-12-14T23:17:45Z


Seems like this came up again recently.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2020-12-15T11:44:48Z


Related to recent conversations, yes. After the new year, I hope to write an application that listens to the WRES via the messaging API and can create PI-XML that can be ingested readily into CHPS. The only trick is assigning the evaluation results a T0, or reference date/time, so that CHPS can store them.

This ticket goes beyond that, however, and discusses visualization in CHPS. Its unclear how best to handle that, though the RFCs have already made a great start, so it might be tackled by them.

Thanks,

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Juzer (Juzer) Original Date: 2020-12-15T12:57:55Z


Hanks, does WRES has API (kind of similar to WRDS) , is this want you are indicating?

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-12-15T13:09:28Z


Juzer wrote:

Hanks, does WRES has API (kind of similar to WRDS) , is this want you are indicating?

Hi Juzer,

The api that your are familiar with is the rest api for the web service of which the cowres is one instance. You are already using the cowres within your scripts. The wres-gui also uses this.

The api that Hank is talking about is an api for wres developers, completely separate from the above. In other words, if someone wants to implement a new format writer for the wres, like pi-xml, they would use this api. This api provides access to all of the evaluation information, including the evaluation description, the statistics and the evaluation status, broken down into atomic pieces. This api uses the amqp protocol over tcp and the message payload is in the protocol buffers format. Collectively, we refer to this as the "messaging api".

We have an implementation of the messaging api in java. Thus, if you want to write a new java format writer or you are working in an environment that can easily interface with the java implementation (e.g., some polyglot vm, like graal), then it should be quite easy to add a new format writer to the wres. You could add this format writer as a standalone application. But, again, this is really intended for professional developers who want to add a new format writer or otherwise receive wres evaluations more directly (rather than requesting and receiving evaluations via the rest api).

epag commented 3 weeks ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2020-12-15T13:37:03Z


But, again, this is really intended for professional developers ...

Well, I guess that disallows me from working on this. :)

Hank

epag commented 3 weeks ago

Original Redmine Comment Author Name: Juzer (Juzer) Original Date: 2020-12-15T14:43:12Z


This sounds promising ... James. I guess I should schedule a meeting for getting the details on this.

We have an implementation of the messaging api in java. Thus, if you want to write a new java format writer or you are working in an environment that can easily interface with the java implementation (e.g., some polyglot vm, like graal), then it should be quite easy to add a new format writer to the wres. You could add this format writer as a standalone application. But, again, this is really intended for professional developers who want to add a new format writer or otherwise receive wres evaluations more directly (rather than requesting and receiving evaluations via the rest api).

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-12-15T16:53:23Z


Sounds good, Juzer.

I think Hank is going to attempt to use the java implementation in the near future, so we will probably get some good feedback from that process and make improvements, including some wiki documentation (as of right now, I am the only person to have touched this code or used it). As a minimum, it needs better wiki documentation, but the first priority was making it work. At that point, it is probably more ready for prime time w/ other developers and we can meet to discuss how you may want to use it for your own work.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2020-12-15T17:09:26Z


I think the API for raw/canonical stats would only be exposed on a case by case basis, but in general would only be visible from inside COWRES or the WRES GUI, not outside.