Thanks for your excellent work and your effort on sharing the code.
Here I have a question when trying to train InternVL2:
In my experiment, I set --save_only_model to avoid saving the "global_step" checkpoint. But I found that the training loss did not converge after 1 epoch. When I restored the checkpoint and started continuous training, the loss increased (It may be because the parameters of adamw have not been restored). Are there some training tips for my experiment?
Can deepspeed train without saving the "global_step" checkpoint? In my understanding, this is necessary for resume because the optimizer's parameters are stored in it.
Thanks for your excellent work and your effort on sharing the code.
Here I have a question when trying to train InternVL2:
In my experiment, I set --save_only_model to avoid saving the "global_step" checkpoint. But I found that the training loss did not converge after 1 epoch. When I restored the checkpoint and started continuous training, the loss increased (It may be because the parameters of adamw have not been restored). Are there some training tips for my experiment?