google-deepmind / graphcast

Apache License 2.0
4.53k stars 572 forks source link

Training duration of pre-training model #77

Closed zhongmengyi closed 2 months ago

zhongmengyi commented 4 months ago

Hello, I would like to ask how long it took to train the three pre-training models provided by Graphcast, and how much memory they occupy? Is there any specific data? Thanks!

alvarosg commented 2 months ago

Thanks for your question.

Training the main 0.25 deg ERA5 GraphCast those models took about four weeks on 32 TPU v4 devices (each TPU with 32GB of RAM). About two weeks for the initial 1 step phase, and another two weeks for the 2-12 steps annealing.

However, for ease of training (see more details here) I would recommend to use GPUs/TPUs with more memory than 32GB.

The operational one took about the same, except that it has an additional phase of 1AR fine-tuning in between those two phases, which takes an extra day.

The 1 deg model takes about 1.5 days to train in total.