Open icastorm opened 2 months ago
Thanks @icastorm great request, currently totalspread is missing.
Here is how I was thinking about it with rmse and bias (I was focusing only on these for a demo of the plots!):
The column is single observation calculation, so squared error and bias are created as columns when we create the dataframe https://github.com/NCAR/pyDARTdiags/blob/7d1b167fb3cbe0d6bb8c33f02cffdb4274889fe0/src/pydartdiags/obs_sequence/obs_sequence.py#L86-L89
sq_err = (mean-obs)**2 bias = mean-obs
For rmse, this is over a group of observations, so you select the group of observations and get the rmse and bias for that group of obs. rmse = sqrt( sum((mean-obs)**2)/n ) bias = sum((mean-obs)/n
I think you're correct that we can treat totalspread in the same way. The function to calculate totalspread is the way to go.
obs_err_var is there as a column in the dataframe. There may be something funky going on if you are not seeing an 'obs_err_var' column
Longer term, I think we might want to split the diagnostic calculations into their own module - I'm guessing someone might want the calculations without necessarily making the plot.
Sorry for the radio silence, its been a busy couple of weeks and I've been on a bit of a time crunch. I will hopefully be back to working on this next week though...
Issue
An important function of the DART diagnostic toolkit is to compare the RMSE to the "total spread", which is defined in src/pydartdiags/obs_sequence/obs_sequence.py as the sqrt(sum(sd+obs_err_var)). Given the usefulness of the total spread value in a variety of contexts (temporal evolution, comparisons between observation types, vertical distribution, etc.), it seem appropriate to add a function that calculates the "total spread" of some set of observations to the plots.py script. Total spread plots could be added later as well.
Solution(s)
Testing