As a user, I would like sample counts that are inline to the statistics - Githubissues

epag commented 2 months ago

Author Name: James (James) Original Redmine Issue: 65101, https://vlab.noaa.gov/redmine/issues/65101 Original Date: 2019-06-18

Expected behavior:

Given a statistical output from the WRES, when I look at that output, then I would like to see the sample size inline to each atomic statistic in that output.

"Atomic" means the most elementary statistic. For a score, it means the sample size associated with the score that summarizes a pool. For a diagram, it means the sample size associated with each statistic that summarizes a sub-pool of the pool. For example, with the rank histogram, it means the number of samples in each bin of the rank histogram. For example, with the reliability diagram, it means the sharpness diagram (i.e., it already exists).

Actual behavior:

Currently, the sample size is a separate metric. This is useful and should remain. However, it is also useful, both for users and developers, to have information on the sample sizes inline to the statistical outputs and for the most atomic type of statistic within that output.

This will add a small amount of bloat to the output from the WRES (small relative to all the other metadata, such as time windows), but I think it is useful, both in terms of location (closer to where it is needed) and specificity (more atomic).

Redmine related issue(s): 85491, 88213, 91948, 97399

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-06-18T12:30:33Z

See, for example, #65049-41.

epag commented 2 months ago

Original Redmine Comment Author Name: Chris (Chris) Original Date: 2019-06-18T13:35:08Z

Just out of curiosity, didn't the application do this at one point? It HAD to have been a while back, but it sounds familiar. We already have it in the Netcdf output, though we might want to look at making it a perpetual Netcdf variable since its current form makes the assumption that the number is the same across all features with a given statistic.

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-06-18T13:54:21Z

No, but you're right in thinking that the most aggregate sample size was part of the metadata at one point (good memory!).

In terms of implementation, I think we'd make it a core component of the statistics, rather than metadata, because that would allow for the sample size to be stored inline to the statistics and at the most atomic level. TBD though.

epag commented 2 months ago

Original Redmine Comment Author Name: James (James) Original Date: 2019-06-18T19:01:29Z

Note #65085-41. In light of that ticket and post, and contrary to the OP, which states (w/r to the @SampleSize@ metric):

This is useful and should remain.

It probably should not remain. It is better to include the sample sizes inline to all of the metrics, rather than include the sample size as a separate metric.

epag commented 2 months ago

Original Redmine Comment Author Name: alexander.maestre (alexander.maestre) Original Date: 2021-03-06T06:18:23Z

James - Thank you for pointing me out to this ticket. I will continue here exploring the threshold case using the csv2 file. I can test either way as a separate metric or inline with the stat.

Regards, Alex

NOAA-OWP / wres

As a user, I would like sample counts that are inline to the statistics #211