Issues with GraphCast Training – Request for Assistance

Dear Professor,

I hope this message finds you well.

I am a postgraduate student and a beginner in deep learning. I have read your article on GraphCast and would like to retrain a GraphCast model based on your source code. I have provided a portion of my code below:

Setting model_config and task_config, along with train_steps and eval_steps.
Here is the main training loop, modified based on the source code provided in the article (I am unsure if it's correct), training for 10 epochs.
The loss and gradients for the 10 training iterations are as follows:

I have encountered several issues during training and would like to seek your guidance:

How are the steps of the dataset divided during training? If training is done in single steps, the first round inputs three time steps, with data from two time steps used for training and one time step for evaluation. For the second round of training, is the training data re-input each time? In other words, should the entire dataset be divided and read in before training, or should it be read in batches?
Regarding the training process implemented in my code, I feel that the training process is not truly effective. Despite using random initialization, the loss and gradients computed in the first round are already very low. Additionally, after 10 rounds of training, the values of loss and gradients have not significantly changed. Could you please provide some guidance on this point?
Concerning the use of the autoregressive.py file during training, I understand that it is used for single-step prediction, but I am unclear on how to use it in training. If you could provide a simple example for training, it would be greatly appreciated.

I look forward to your response. Thank you.

Best regards！

google-deepmind / graphcast

Issues with GraphCast Training – Request for Assistance #78