CALIPSO-project / SPINacc

A spinup acceleration procedure for land surface models (LSM)
4 stars 0 forks source link

Increase the size of the dataset by including the time dimension #57

Open tztsai opened 2 weeks ago

tztsai commented 2 weeks ago

Currently in readvar.py, the input variables are reduced over the time dimension (spanning a duration of 10 years) to calculate statistics like mean and std. This significantly reduces the size of the dataset. Although this can reduce the computational load, the data size turns out to be insufficient for training an NN (to replace the current tree ensembles). What we can do is to directly use the monthly averaged data samples, so that the data size will be 120 times larger.

dsgoll123 commented 1 week ago

We used monthly statistics related to the climate which include coldest month, etc, instead of using the 6h meteo data. So, I see no objection for the direction of change you propose.

The number of years will vary between 10-20 years depending on the type of land surface model simulations. For the idealized small test case I am thinking to reduce it to 1 year.

In order to be independent of the number of years, can you use multi-year daily average ( e.g. average of Jan 1st of all years) (365x times larger) ? or daily ( 365 * years larger) ?

dsgoll123 commented 1 week ago

Mind, we run ORCHIDEE at different spatial resolution from 2x2 degree to 0.5 x 0.5 degree. Factor of 16 differences in pixels.