NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a developer, I want to identify and implement a solution for ingesting and evaluating categorical data #124

Open epag opened 2 months ago

epag commented 2 months ago

Author Name: Hank (Hank) Original Redmine Issue: 119117, https://vlab.noaa.gov/redmine/issues/119117 Original Date: 2023-08-04


See #118364 for the requirement being addressed and #115608 for the use case with slide examples. The requirement is for ingesting categorical data and using it in an evaluation.

This ticket can be resolved once we have designed a solution and implemented it. Leaving as normal priority and in the backlog pending prioritization of the NWM team requirements for use case #115608.

Hank

epag commented 2 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-08-04T15:53:57Z


Just a ballpark estimate. Seem like this might be a pretty big change so I'll go with a high number, 256 hours.

Hank

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2023-08-22T12:22:29Z


I think this number is a bit on the high side for a first cut.

I don't think we'd need to fundamentally change the data model. We have internal representations of categorical data in the @wres.datamodel@ and I think existing ingesters and readers can be used, since categorical data can be represented on a continuous numeric/probabilistic scale (i.e., within the unit interval [0,1], which is 0 or 1 for the special case of single-valued data), providing we also identify the categorical event to which that probability refers.

The internal datamodel and database schema would need to be expanded to identify the event associated with a time-series of probabilities, probably as a new @event_id@ on @wres.timeseries@ (with a new @wres.event@ relation composing an @event_id@ and varchar) and a new event within @wres.datamodel.time.TimeSeriesMetadata@. Ultimately, these time-series events would get mapped to two-category (boolean) or probabilistic representations prior to metric calculation using a slightly modified pathway than the one based on thresholds.

I would say "128", but it could be less than this for a first cut.

epag commented 2 months ago

Original Redmine Comment Author Name: Hank (Hank) Original Date: 2023-08-22T14:23:56Z


Adjusting per James's recommendation,

Hank