How to train a model by myself

google-deepmind / graphcast

Apache License 2.0

4.36k stars 537 forks source link

How to train a model by myself #69

Closed Trek-on closed 1 month ago

Trek-on commented 2 months ago

Hello, I have a few questions that I would like to ask you:

Open source code only provides prediction and fine-tuning code, without providing training code, right? If I want to train my own model, it's best to make one by myself?
The reason why we can see all the technical details is because when the model predicts, it predicts by reading the downloaded model parameters. Is this understanding correct?
"Training low resolution random models", I mean the fine-tuning code for the last part of the official Notebook instance. How did he implement it? It seems that he did not download the training data，is it just randomly adjusting the parameters in the parameter file?

Thank you very much!

zlminmin commented 1 month ago

Excuse me, have you implemented training a new graphcast model yourself?

alvarosg commented 1 month ago

Thanks for your message, the open source code provides a "loss" function, which you can use to both train and fine-tune the model if you can fit it in your hardware. However you would need to provide your own data iterators, and implement batch parallelism (to train on multiple devices simultaneously and this way reduce training time) for your specific platform.

zlminmin commented 1 month ago

Thank you very much for your answer, I'm a beginner and it's hard for me to reproduce such a complex model as GraphCast. I want to learn such a good model, but the training details are not mentioned in the paper, which is not enough for me to complete the reproduction independently. So could you please provide an example of training from scratch that I can use as a reference.

alvarosg commented 1 month ago

but the training details are not mentioned in the paper To the best of our knowledge all training details for minimizing the loss (optimizer, batch size, trajectory sampling, learning rate schedules, etc) are provided in the supplementary materials of the paper (sections 4.4 and 4.5).

If there is something you find is missing, please let us know we will more than happy to clarify!