NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, when I conduct an evaluation with unrecognized measurement units, then the WRES should accept those units #222

Closed epag closed 1 week ago

epag commented 3 weeks ago

Author Name: James (James) Original Redmine Issue: 73517, https://vlab.noaa.gov/redmine/issues/73517 Original Date: 2020-01-17


Expected behavior:

Given an evaluation of one or more sources that contain a measurement unit that is absent from @wres.MeasurementUnit@ When I execute that evaluation Then it should succeed, providing it would succeed otherwise and a unit conversion is not required

( When a unit conversion is not required, the absence of a unit in @wres.MeasurementUnit@ should not prevent an evaluation. )

In that sense, the expected behavior should be similar to a source that contains a feature that is absent from @wres.Feature@.

Actual behavior:

This is a showstopper for any evaluation of one or more sources that contains unrecognized (to @wres.MeasurementUnit@) units.

Present example is for an evaluation of simulations in support of #73499. Gzipped copy attached. Any declaration that incorporates this source encounters an exception like this in relation to the SRSO variable, which is the reservoir volume in cubic meters, denoted M3 in the source.

<type>instantaneous</type>
<locationId>MMRW1</locationId>
<parameterId>SRSO</parameterId>
<qualifierId>rtcTools</qualifierId>
<timeStep multiplier="21600" unit="second" />
<startDate date="2009-10-01" time="12:00:00" />
<endDate date="2019-10-09" time="12:00:00" />
<missVal>-999</missVal>
<stationName>MMRW1</stationName>
<creationDate>2020-01-13</creationDate>
<creationTime>11:14:13</creationTime>
<units>M3</units>
</code>
ERROR Control Could not complete project execution due to:
wres.io.reading.IngestException: An ingest task could not be completed.
    at wres.io.Operations.doIngestWork(Operations.java:428)
    at wres.io.Operations.ingest(Operations.java:385)
    at wres.control.ProcessorHelper.processProjectConfig(ProcessorHelper.java:105)
    at wres.control.Control.accept(Control.java:246)
    at wres.control.Control.apply(Control.java:133)
    at wres.control.Control.apply(Control.java:1)
    at wres.MainFunctions.call(MainFunctions.java:96)
    at wres.Main.main(Main.java:91)
    at wres.MainLocal.main(MainLocal.java:179)
Caused by: java.util.concurrent.ExecutionException: wres.io.concurrency.WRESRunnableException: Callable task failed
    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at wres.io.Operations.doIngestWork(Operations.java:415)
    ... 8 common frames omitted
Caused by: wres.io.concurrency.WRESRunnableException: Callable task failed
    at wres.io.concurrency.WRESCallable.call(WRESCallable.java:32)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: wres.io.reading.IngestException: While ingesting timeseries at line 29329 and column 12, encountered an issue.
    at wres.io.reading.fews.PIXMLReader.parseElement(PIXMLReader.java:157)
    at wres.system.xml.XMLReader.parse(XMLReader.java:161)
    at wres.io.reading.fews.FEWSSource.save(FEWSSource.java:61)
    at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:248)
    at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:1)
    at wres.io.concurrency.WRESCallable.call(WRESCallable.java:18)
    ... 4 common frames omitted
Caused by: java.sql.SQLException: No value can be loaded with the key of: m3
    at wres.io.data.details.CachedDetail.save(CachedDetail.java:86)
    at wres.io.data.caching.Cache.addElement(Cache.java:117)
    at wres.io.data.caching.Cache.getID(Cache.java:100)
    at wres.io.data.caching.MeasurementUnits.getMeasurementUnitID(MeasurementUnits.java:96)
    at wres.io.reading.fews.PIXMLReader.parseHeader(PIXMLReader.java:464)
    at wres.io.reading.fews.PIXMLReader.parseSeries(PIXMLReader.java:260)
    at wres.io.reading.fews.PIXMLReader.parseElement(PIXMLReader.java:146)
    ... 9 common frames omitted
ERROR Main Operation 'execute' completed unsuccessfully
wres.control.InternalWresException: Could not complete project execution
    at wres.control.Control.accept(Control.java:259)
    at wres.control.Control.apply(Control.java:133)
    at wres.control.Control.apply(Control.java:1)
    at wres.MainFunctions.call(MainFunctions.java:96)
    at wres.Main.main(Main.java:91)
    at wres.MainLocal.main(MainLocal.java:179)
Caused by: wres.io.reading.IngestException: An ingest task could not be completed.
    at wres.io.Operations.doIngestWork(Operations.java:428)
    at wres.io.Operations.ingest(Operations.java:385)
    at wres.control.ProcessorHelper.processProjectConfig(ProcessorHelper.java:105)
    at wres.control.Control.accept(Control.java:246)
    ... 5 common frames omitted
Caused by: java.util.concurrent.ExecutionException: wres.io.concurrency.WRESRunnableException: Callable task failed
    at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
    at wres.io.Operations.doIngestWork(Operations.java:415)
    ... 8 common frames omitted
Caused by: wres.io.concurrency.WRESRunnableException: Callable task failed
    at wres.io.concurrency.WRESCallable.call(WRESCallable.java:32)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: wres.io.reading.IngestException: While ingesting timeseries at line 29329 and column 12, encountered an issue.
    at wres.io.reading.fews.PIXMLReader.parseElement(PIXMLReader.java:157)
    at wres.system.xml.XMLReader.parse(XMLReader.java:161)
    at wres.io.reading.fews.FEWSSource.save(FEWSSource.java:61)
    at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:248)
    at wres.io.concurrency.IngestSaver.execute(IngestSaver.java:1)
    at wres.io.concurrency.WRESCallable.call(WRESCallable.java:18)
    ... 4 common frames omitted
Caused by: java.sql.SQLException: No value can be loaded with the key of: m3
    at wres.io.data.details.CachedDetail.save(CachedDetail.java:86)
    at wres.io.data.caching.Cache.addElement(Cache.java:117)
    at wres.io.data.caching.Cache.getID(Cache.java:100)
    at wres.io.data.caching.MeasurementUnits.getMeasurementUnitID(MeasurementUnits.java:96)
    at wres.io.reading.fews.PIXMLReader.parseHeader(PIXMLReader.java:464)
    at wres.io.reading.fews.PIXMLReader.parseSeries(PIXMLReader.java:260)
    at wres.io.reading.fews.PIXMLReader.parseElement(PIXMLReader.java:146)
    ... 9 common frames omitted

I think we identify this as @CM@, which is a recognized unit as of commit:wres|ae5cf8f707b7687ceea2553d1d45a42cb0efaee9.

select * from wres.MeasurementUnit where unit_name LIKE '%M%'
</code>

-->

|_.measurementunitid |.unit_name | | 14 | CM |

If so, it would make sense to have a concept of aliasing/cross-walking and to cross-walk these two unit aliases, CM and M3, because cross-walking buys us unit conversions, but that is not the main request of this ticket.


Redmine related issue(s): 73529


epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T17:24:18Z


Linking #39724, but this conflicts with the objective there.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T17:25:42Z


Worth noting that the only "workaround" is to modify the source, which is not really a workaround.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2020-01-17T17:40:08Z


We had this functionality a while back and it was ripped out. The way it used to work was that it would record a new measurement unit and store an identity unit conversion for it.

@wres.UnitConversion@ serves as a sort of crosswalk with @wres.MeasurementUnit@ being the aliases. We've encountered a few new units; we should probably add those in addition to adding the handling of unrecognized units.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T17:40:28Z


Oh, now I'm thinking CM is "centimeters" not cubic meters. Anyway, not important for the goal of this ticket.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T17:46:24Z


Chris wrote:

We had this functionality a while back and it was ripped out. The way it used to work was that it would record a new measurement unit and store an identity unit conversion for it.

@wres.UnitConversion@ serves as a sort of crosswalk with @wres.MeasurementUnit@ being the aliases. We've encountered a few new units; we should probably add those in addition to adding the handling of unrecognized units.

Any hint about why?

I think of unit naming and unit identification and unit conversion as separate things. I think we should be neutral about the naming of units and accept unit names at face value.

I think we cannot be neutral about unit conversions because they require -unit identification and unit identification requires- a -cross-walk- functional mapping between named -when multiple names can refer to the same- units. For example, we have @m3/s@ and @m3/sec@ in @wres.MeasurementUnit@. These are aliases for cubic meters per second of time.

I don't really see @wres.UnitConversion@ as being any sort of crosswalk. I see it as being a functional mapping between one named unit and another named unit. I suppose I can see your point when I think about the identity function, but I don't think @wres.UnitConversion@ should play any role in resolving aliases among unit names.

edit: sorry, had to make several edits above.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T17:56:57Z


Oh, I think I see what you mean now.

If I look in @wres.unitconversion_v1.xml@, I see two things there. One is a resolution of aliases. Another is unit conversions (for all possible aliases).

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T18:00:14Z


I think we probably need to separate out this concept of identified units with multiple names (aliases) and unit conversion.

For example, when I see this in @wres.unitconversion_v1.xml@:

            INSERT INTO wres.UnitConversion(from_unit, to_unit, factor)
            SELECT F.measurementunit_id,
                T.measurementunit_id,
                1/35.3146662127
            FROM wres.MeasurementUnit F
            CROSS JOIN wres.MeasurementUnit T
            WHERE F.unit_name = ANY('{CFS, ft3/s}'::text[])
                AND T.unit_name = ANY('{CMS, m3 s-1, m3/s}'::text[])
                AND NOT EXISTS (
                    SELECT 1
                    FROM wres.UnitConversion UC
                    WHERE UC.from_unit = F.measurementunit_id
                        AND UC.to_unit = T.measurementunit_id
                );
</code>

Then I predict that conversions between @CFS@ and @m3/sec@ will not be possible. Right? It seems like @m3/sec@ is one of our units in @wres.measurementunit_v1.xml@.

That needs a different ticket though.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2020-01-17T18:03:55Z


Yeah, that's why I said 'sort of'; not a far stretch, but still a stretch. It definitely functions as one, though; it manages to map units like @m3/s@ to @m3/sec@ and @cfs@ to @ft3/s@ and state their relationship.

Some sort of alias link table could work, though. It would definitely make liquibase scripts easier to write, though it might shift some of that complexity into the application.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2020-01-17T18:05:42Z


| Then I predict that conversions between CFS and m3/sec will not be possible. Right? It seems like m3/sec is one of our units in wres.measurementunit_v1.xml.

A bunch of different units and conversions have been added since v1; m3/sec wasn't initially included for some reason, so that got added later. If you look in any of the other @wres.measurementunit_*.xml@ files, you'll probably see the inserts for the unit conversions too.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T18:10:36Z


OK, thanks.

So, I think I see three things here.

  1. Being neutral/accepting about unit names when conversions aren't needed (this ticket)
  2. Adding some more units to @wres.MeasurementUnit@ and associated cross-walk/mappings in @wres.UnitConversion@ (needs a ticket)
  3. Think about how best to handle aliases for unit names, which at least conceptually overlaps with aliases in other contexts, such as @wres.Feature@ (needs a ticket)
epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2020-01-17T18:31:46Z


I'm not sure about point number 1. For better or worse, I'm always on the side of making things easier on the user, and being accepting of new units would be just that. That might open us to a situation where someone add @cubic feet per second@ and it never gets linked to anything other than @cubic feet per second@, though, so that's pretty debatable.

While we're on the subject, Jason mentioned adding in custom transformations/conversions in the configuration a little while back. If we could think about how to do that, it'd be a cool feature to have in at some point. The only way I can think to do it is like MathML, though, and that's pretty gross.

UPDATE: Ran into JEP and mXparser. JEP looks pretty straight forward (define your formula in a string, then define your variables), while mXparser looks really verbose but pretty cool.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T18:50:21Z


Right, agree that a user should be able to add units and unit conversions and we need to think about how best to facilitate it.

Ideally, it would be in-band to an evaluation, at least for units, because units come from sources or declaration.

The difficulty with being strict on unit names is that it restricts our audience to our little domain of hydrology/met, and even a subset of that (the things we encounter routinely). We're only ever going to add units that we encounter for hydro and met users that we encounter, but our software is useful across many domains and we should acknowledge that. So having a "hard fail" for an unrecognized unit name seems pretty unhelpful in that context.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2020-01-17T18:56:02Z


The difficulty with being strict on unit names is that it restricts our audience to our little domain of hydrology/met, and even a subset of that (the things we encounter routinely). We're only ever going to add units that we encounter for hydro and met users that we encounter, but our software is useful across many domains and we should acknowledge that. So having a "hard fail" for an unrecognized unit name seems pretty unhelpful in that context.

That's music to my ears; I've been hoping that this could extend beyond hydro since the beginning (hence goofy measurement units like Celsius and Kelvin). It's nice to hear someone else say it.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2020-01-17T19:01:51Z


Chris wrote:

The difficulty with being strict on unit names is that it restricts our audience to our little domain of hydrology/met, and even a subset of that (the things we encounter routinely). We're only ever going to add units that we encounter for hydro and met users that we encounter, but our software is useful across many domains and we should acknowledge that. So having a "hard fail" for an unrecognized unit name seems pretty unhelpful in that context.

That's music to my ears; I've been hoping that this could extend beyond hydro since the beginning (hence goofy measurement units like Celsius and Kelvin). It's nice to hear someone else say it.

Yeah. Examples of other domains where verification is a big deal and from which many verification metrics originate: bio/medical sciences, other environmental and geosciences (ecology, oceanography etc.), econometrics, psychology, electrical engineering, signal processing, machine learning.

Relative to some of these, hydrology is actually a late-comer to statistical verification.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2021-10-22T12:31:41Z


Need to revisit in light of progress in #44819, probably resolved.

james-d-brown commented 1 week ago

Certain this is OBE, but will double-check before closing.

james-d-brown commented 1 week ago

Create a file predictions.csv with the following content:

value_date,variable_name,location,measurement_unit,value
1985-06-01T13:00:00Z,monkey_food,myLocation,bananas,21
1985-06-01T14:00:00Z,monkey_food,myLocation,bananas,22

Create a file observations.csv with the following content:

value_date,variable_name,location,measurement_unit,value
1985-06-01T13:00:00Z,monkey_food,myLocation,bananas,23
1985-06-01T14:00:00Z,monkey_food,myLocation,bananas,25

Create a file evaluation.yml with the following content:

observed: observations.csv
predicted: predictions.csv

Witness the evaluation succeed and produce statistics like this:

LEFT VARIABLE NAME,RIGHT VARIABLE NAME,BASELINE VARIABLE NAME,COVARIATE FILTERS,POOL NUMBER,EVALUATION SUBJECT,FEATURE GROUP NAME,LEFT FEATURE NAME,LEFT FEATURE WKT,LEFT FEATURE SRID,LEFT FEATURE DESCRIPTION,RIGHT FEATURE NAME,RIGHT FEATURE WKT,RIGHT FEATURE SRID,RIGHT FEATURE DESCRIPTION,BASELINE FEATURE NAME,BASELINE FEATURE WKT,BASELINE FEATURE SRID,BASELINE FEATURE DESCRIPTION,EARLIEST ISSUED TIME EXCLUSIVE,LATEST ISSUED TIME INCLUSIVE,EARLIEST VALID TIME EXCLUSIVE,LATEST VALID TIME INCLUSIVE,EARLIEST LEAD DURATION EXCLUSIVE,LATEST LEAD DURATION INCLUSIVE,TIME SCALE DURATION,TIME SCALE FUNCTION,TIME SCALE START MONTH-DAY INCLUSIVE,TIME SCALE END MONTH-DAY INCLUSIVE,EVENT THRESHOLD NAME,EVENT THRESHOLD LOWER VALUE,EVENT THRESHOLD UPPER VALUE,EVENT THRESHOLD UNITS,EVENT THRESHOLD LOWER PROBABILITY,EVENT THRESHOLD UPPER PROBABILITY,EVENT THRESHOLD SIDE,EVENT THRESHOLD OPERATOR,DECISION THRESHOLD NAME,DECISION THRESHOLD LOWER VALUE,DECISION THRESHOLD UPPER VALUE,DECISION THRESHOLD UNITS,DECISION THRESHOLD LOWER PROBABILITY,DECISION THRESHOLD UPPER PROBABILITY,DECISION THRESHOLD SIDE,DECISION THRESHOLD OPERATOR,METRIC NAME,METRIC COMPONENT NAME,METRIC COMPONENT QUALIFIER,METRIC COMPONENT UNITS,METRIC COMPONENT MINIMUM,METRIC COMPONENT MAXIMUM,METRIC COMPONENT OPTIMUM,STATISTIC GROUP NUMBER,SUMMARY STATISTIC NAME,SUMMARY STATISTIC COMPONENT NAME,SUMMARY STATISTIC UNITS,SUMMARY STATISTIC DIMENSION,SUMMARY STATISTIC QUANTILE,SAMPLE QUANTILE,STATISTIC
monkey_food,monkey_food,,,1,RIGHT,"myLocation-myLocation","myLocation",,,,"myLocation",,,,,,,,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,-1000000000-01-01T00:00:00Z,+1000000000-12-31T23:59:59.999999999Z,PT-2562047788015215H-30M-8S,PT2562047788015215H30M7.999999999S,,,,,,-Infinity,,,,,LEFT AND RIGHT,GREATER,,,,,,,,,MEAN ERROR,MAIN,,bananas,-Infinity,Infinity,0.0,7,,,,,,,-2.5