NOAA-OWP / wres

Code and scripts for the Water Resources Evaluation Service
Other
2 stars 1 forks source link

As a user, I want to be able to dictate a proper feature name for output #285

Open epag opened 3 weeks ago

epag commented 3 weeks ago

Author Name: Chris (Chris) Original Redmine Issue: 98384, https://vlab.noaa.gov/redmine/issues/98384 Original Date: 2021-11-04


I just ran a job for a test and managed to generate the following graph:

!02343940_2310009_NWM_Short_Range_VOLUMETRIC_EFFICIENCY.png!

Just by looking at it, what does the graph represent? It has no variable and the only description I have as to where it pertains to is @02343940-2310009@. From experience (and having run the project personally), I know that that means USGS Site 02343940 and NWM Feature ID 2310009, neither of which still makes sense unless I go to USGS' website and search for the site and find a shapefile for the NWM and look through that to find 2310009.

Instead, I'd like to somehow display a name on there. For the above, I'd prefer that it read "SAWHATCHEE CREEK AT CEDAR SPRINGS, GA" or "CEDG1" instead (bonus points for title case on the former). I can't necessarily think of a clean way to do this given discrete location definitions, but, if the feature service were employed, I could theoretically state something like @usgs_data/name@ or @nws_lid@ and have the WRES glean that information when reading from the service.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2021-11-04T19:56:27Z


One thing we could do, which is a natural extension of the multi-feature grouping, is to allow a @name@ to be assigned to a singleton feature group, i.e., a single tuple.

Like this:

<feature left="ASEN6HUD" right="ASEN6HUD" baseline="ASEN6HUD" name="Look, a feature!"/>
</code>

Analogously:

<featureGroup name = "Look, a feature group!">
   <feature left="ASEN6HUD" right="ASEN6HUD" baseline="ASEN6HUD"/>
   <feature left="BKBN6HUD" right="BKBN6HUD" baseline="BKBN6HUD"/>
</featureGroup>
</code>

edit: under the hood, everything is a feature group anyway, so this would be a very minor change. Obviously, you could kind of get that behavior now by declaring a singleton group using the long-form @featureGroup@ declaration, but that is a bit kludgey if you only want a singleton.

Another possibility might be to allow the more friendly "station name" attribute to be used, which is sometimes available, but not always (and is sometimes more friendly, not always).

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2021-11-04T20:19:56Z


I really like your example as it reads pretty well and isn't confusing.

I put it as @low@ because it's definitely not something that is urgent and desperately needs to be addressed. I don't even know when that can be worked into the GUI.

It will be appreciated (by me at the very least) if it were an option in some form, just so that it can be easier to understand context free. What we have now works well and I wouldn't remove, just keep it as a pretty solid fallback if the user wasn't able to provide more.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Jesse (Jesse) Original Date: 2021-11-04T20:40:01Z


James addressed the geographic feature part. I think the feature grouping capability provides feature aliasing at no extra cost.

As for -units- variables and others, labeling on graphics is the purpose of the "label" attribute sprinkled throughout the declaration schema. So I just ran this through the COWRES:

<?xml version="1.0" encoding="UTF-8"?>
<project label="ExampleProject" name="web demo">

    <inputs>
        <left>
            <type>observations</type>
            <source>/mnt/wres_share/systests/data/DRRC2QINE.xml</source>
            <variable label="chicken">QINE</variable>
        </left>
        <right label="HEFS">
            <type>ensemble forecasts</type>
            <source>/mnt/wres_share/systests/data/drrc2ForecastsOneMonth/</source>
            <variable>SQIN</variable>
        </right>
    </inputs>

    <pair>
        <unit>m3/s</unit>
        <feature left="DRRC2HSF" right="DRRC2HSF" />
        <leadHours minimum="0" maximum="48" />
    </pair>

    <metrics>
        <thresholds>
            <type>probability</type>
            <commaSeparatedValues>0.002, 0.01, 0.1, 0.9, 0.99, 0.998</commaSeparatedValues>
            <operator>greater than or equal to</operator>
        </thresholds>

        <metric><name>sample size</name></metric>
        <metric><name>mean absolute error</name></metric>
        <metric><name>root mean square error</name></metric>
        <metric><name>box plot of errors by observed value</name></metric>
        <metric><name>box plot of errors by forecast value</name></metric>
        <metric><name>quantile quantile diagram</name></metric>
        <metric><name>rank histogram</name></metric>
        <metric><name>relative operating characteristic diagram</name></metric>
        <metric><name>reliability diagram</name></metric>
    </metrics>
    <outputs>
        <destination type="png" />
    </outputs>

</project>
</code>

And this was produced: !DRRC2HSF_DRRC2HSF_HEFS_RANK_HISTOGRAM_172800_SECONDS.png!

Edit: I guess for units there are unit aliases. And yeah it stinks that it's contextual.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2021-11-05T13:32:39Z


Now that I look at your example, I can't help but think that the names should be reduced if they are the same. If the features are @left="DRRC2HSF"@ and @right="DRRC2HSF"@, printing @DRRC2HSF@ rather than @DRRC2HSF-DRRC2HSF@ reads better. If you have @left="DRRC2HSF"@ and @right="DRRC2HSF"@ and @baseline="DRRC2"@, though, @DRRC2HSF-DRRC2HSF-DRRC2@ still makes sense. Even then, @RIGHT vs LEFT@ is a little easier to read and understand than @LEFT-RIGHT@ since it reads a little more like "@RIGHT@ compared to @LEFT@" as opposed to "@LEFT@ and @RIGHT@". They are the same in this context, so it's purely for readability's sake, not functionality's.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2021-11-05T13:42:09Z


I think we could do that. The default name chosen for a singleton feature group is a short string representation of the single tuple within it, which takes the form of l-r-b feature names. We could de-duplicate that in the situation where all two or all three names are identical. I guess the downside is consistency, i.e., two different string representations depending on the content, but the upside is parsimony.

edit: another downside is that it would break a lot of system test benchmarks (file names for legacy csv and content of pairs), depending on the scope of the change.

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2021-11-05T14:43:51Z


I would say deduplicate when all non-null names are equal. I think @DRRC2HSF-DRRC2@ would be confusing if it were actually @DRRC2HSF-DRRC2HSF-DRRC2@ since it won't be clear which one of the three were @DRRC2@. Or if two of them were.

Also, while this does not pertain to features, bonus points for title case for metric and axis names. "Observed Relative Frequency" and "Rank Histogram" read better on graphical products compared to "OBSERVED RELATIVE FREQUENCY" and "RANK HISTOGRAM". It makes sense for pure data products like CSV or Netcdf where things need to be highlighted like that, but it adds a layer of polish when terminology on graphical products mimic natural language.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2021-11-05T14:54:06Z


Chris wrote:

I would say deduplicate when all non-null names are equal. I think @DRRC2HSF-DRRC2@ would be confusing if it were actually @DRRC2HSF-DRRC2HSF-DRRC2@ since it won't be clear which one of the three were @DRRC2@. Or if two of them were.

Yup, that's what I meant by all two or all three, agree.

Chris wrote:

Also, while this does not pertain to features, bonus points for title case for metric and axis names. "Observed Relative Frequency" and "Rank Histogram" read better on graphical products compared to "OBSERVED RELATIVE FREQUENCY" and "RANK HISTOGRAM". It makes sense for pure data products like CSV or Netcdf where things need to be highlighted like that, but it adds a layer of polish when terminology on graphical products mimic natural language.

We could do that, these are all enumerations (which is why they appear like that by default), so just needs a @toString@ override. I think there's a helper for that in commons (edit: capitalization, that is).

epag commented 3 weeks ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2021-11-05T15:02:52Z


Considering that the graphical outputs are probably most (if not all) that our gracious overlords see, that spit-shine might provide some good/easy office-political brownie points.

epag commented 3 weeks ago

Original Redmine Comment Author Name: James (James) Original Date: 2021-11-05T15:06:01Z


Yup. I think all these things are super easy, mostly one liners. Hardest thing is updating benchmarks, I guess. I wouldn't call any of them breaking changes, on first glance, unless downstream users' applications are overly sensitive.