Open veya2ztn opened 2 years ago
hey, are training and test .h5 files , eg. train/2015.h5
with simliar data shape (4D data)
I am also wondering about that, did you find any solution so far? In their paper they write
we use a time-averaged climatology in this work, motivated by [Rasp et al., 2020])
which is https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020MS002405 defined just above A1, so that seems to be the correct way 🤷🏼
Digging further into this, I found in the appendix this description:
long-term-mean-subtracted value of predicted (/true) variable v at the location denoted by the grid co-ordinates (m, n) at the forecast time-step l. The long-term mean of a variable is simply the mean value of that variable over a large number of historical samples in the training dataset. The long-term mean-subtracted variables X ̃ pred/true represent the anomalies of those variables that are not captured by the long term mean values
which reads that we subtract from our variables their mean -- which we do during data loading, and the mean is correctly computed over a long term (in get_stats.py
)
--
Edit: However, there's the thing that the variables are also scaled by their std_dev
. so it's not only the mean that is removed
Please see: https://github.com/NVlabs/FourCastNet/blob/master/data_process/get_stats.py
the time_means is constant zero follow this script. What is the correct defination for this value?
BTW, may I know how you calculate the
time_means_daily.h5
file? From its size (127G) I can only guess it is a $(1460,21,720,1440)$ tensor.