Open Wloner0809 opened 1 month ago
Thanks for your great work! I would like to ask why there are two sudden drops in this loss graph(copied from https://huggingface.co/O1-OPEN/OpenO1-LLama-8B-v0.1/blob/main/training_loss.png) Does this mean that the current stage of this work is overfitting on the O1-stype data format through sft?
That's likely to be 3 epochs and a high learning rate. 2nd and third time the model is exposed to the same data, it's familiar with it.
Thanks for your great work! I would like to ask why there are two sudden drops in this loss graph(copied from https://huggingface.co/O1-OPEN/OpenO1-LLama-8B-v0.1/blob/main/training_loss.png) Does this mean that the current stage of this work is overfitting on the O1-stype data format through sft?