Training duration of pre-training model

Thanks for your question.

Training the main 0.25 deg ERA5 GraphCast those models took about four weeks on 32 TPU v4 devices (each TPU with 32GB of RAM). About two weeks for the initial 1 step phase, and another two weeks for the 2-12 steps annealing.

However, for ease of training (see more details here) I would recommend to use GPUs/TPUs with more memory than 32GB.

The operational one took about the same, except that it has an additional phase of 1AR fine-tuning in between those two phases, which takes an extra day.

The 1 deg model takes about 1.5 days to train in total.

google-deepmind / graphcast

Training duration of pre-training model #77