Closed jtruesdal closed 2 years ago
Thanks for bringing this to our attention! You seem to be absolutely correct, we'll look into possible reasons for this and get back to you. Is the data set you want to run the model on very different statistically from the CAM5.1 data our model was trained on? You can try using the model with the wonky normalization for now and see if there are any performance issues.
Hi Andre. I work for NCAR and our group is actually collaborating with the climatenet team although that started a few years ago while Prabhat was still there. We produced the CAM5.1 data that is being used for training. The cam6 data we are using should have the same statistics as the data in your repository. Although these bad normalizing values do produce valid looking masks the normalizing data looks to be far enough off (at least for PSL) to affect the accuracy. I just wanted to make sure I wasn't missing something. We will recalculate the means/std, fix the config file and retrain, I trust you'll do the same. I think the new interface looks great by the way, nice job!
Hi @andregraubner, I've been working with @jtruesdal and happened to recently calculate some of the mean and standard deviation values for the training dataset, so I thought I'd share that here in case it is helpful for cross-checking. TMQ mean: 19.21849 TMQ std: 15.73182 U850 mean: 1.55302 U850 std: 8.27790 V850 mean: 0.25413 V850 std: 6.21594 PSL mean: 100814.07031 PSL std: 1454.36969
Thank you very much. We will soon provide an additional pre-trained model using these values. I'll post an update here then and close the issue accordingly. Please reach out if anything else pops up!
Thank you again for bringing this to our attention. We have updated the pre-trained model accordingly and verified that there results reported in the paper still hold. Please don't hesitate to reach out if anything else comes up.
@andregraubner I'm revisiting this issue as I have two questions about best practices for calculating these means and standard deviations for the cgnet config file. I'd be interested to hear your & others thoughts.
For reference, here are the mean values if you include weighting by cos(lat)
over space. They do differ from above but not very significantly:
TMQ mean: 24.92724
U850 mean: 1.03567
V850 mean: 0.20848
PSL mean: 101095.0352
I have a notebook here if you want to take a closer look at the calculations: https://github.com/katiedagon/ML-extremes/blob/main/notebooks/get_averages_and_standard_devs.ipynb
@andregraubner The means and std values in the configuration file look to be incorrect. I am assuming these are actual means and std calculated from the training data and used for normalization. For instance the mean for PSL is 1619.3 which is odd as its calculated from a field that is in hPA. (94000-105000). As long as the data used for inference is also normalized using these wonky values the model is able to pick out features correctly but as soon as means calculated from the inference data are used the trained model is unable to detect AR's TC's