NOAA-EMC / WW3-tools

20 stars 16 forks source link

Add number of data points for mval stats #60

Closed JessicaMeixner-NOAA closed 6 months ago

JessicaMeixner-NOAA commented 6 months ago

Pull Request Summary

Add "N" number of data points used in stat calculations in mvalstats

Description

To help with debugging and also just general knowledge, since mval can filter out values we want to make sure we know how many values are being used to calculate stats, so "N" number of obs is added as output.

Issue(s) addressed

Issue was not created.

Commit Message

Add number of data points for mval stats

Check list

Testing

import numpy as np
import mvalstats

a =[ 1, 2, 3, 4, 5] 
b =[np.nan,3,4,5,6] 

error_result = mvalstats.metrics(a, b)
print(error_result)

And the output was:

[-1.          1.         -0.22222222  0.21566555  0.          0.
  0.24253562  1.          4.        ]
ricampos commented 6 months ago

The new code works well, I tested mvalstats.metrics and it ran as expected. I have two comments, one recommendation, and one modification: 1) This is written as the number of observations but, in fact, this is the number of matchups used to calculate the metrics. If either one, the model or obs array, contains a NaN, then this pair is excluded. I'm mentioning this so people don't enter one array of observations expecting a certain number and end up getting a different number as the output; 2) With find . -name '*.py' | xargs grep -n "metrics" I could note that pvalstats.py uses mvalstats.metrics to generate some validation plots. And now you're passing an array with a larger size so it will break (more specifically the arrays monthlyval and errm). There are 2 solutions: (1) make this output optional (user input nobs='yes', and make the default to be 'no'); or modify pvalstats.py by adding [0:8] whenever it is received outputs from the new mvalstats.metrics

JessicaMeixner-NOAA commented 6 months ago

@ricampos both great comments! I didn't even think about how this was used, thanks for catching that. I'll have an update for re-review shortly!

JessicaMeixner-NOAA commented 6 months ago

@ricampos I think that by nerrm to add 'N' in the name array, this correctly adjusts all the pval stat arrays, but would be worth confirming with you.

ricampos commented 6 months ago

Yes, I agree. And it is a good practice because this will induce people to know the number of samples used for the statistics. Thanks

JessicaMeixner-NOAA commented 6 months ago

@ricampos thanks for reviewing and merging!