MARIO-Math-Reasoning / Super_MARIO

MIT License
183 stars 13 forks source link

Is the model initialized from pre-trained model or model from the last iteration round for each round? #13

Closed tongyx361 closed 2 months ago

tongyx361 commented 3 months ago

I could not find the exact implementation.

Chen-GX commented 2 months ago

Thank you for your insightful question. For each iteration round, we choose to initialize the model from a pre-trained model rather than the model from the last iteration round. In our experiments, we have found that the quality of autonomously generated solution improves with each successive round. However, relying on the model from the previous round may lead to convergence to local minima due to the influence of suboptimal solutions. Thus, reintroducing the pre-trained model at the beginning of each round serves as a strategy to prevent compounding potential biases or errors and aids in avoiding the entrapment in local minima. Continuing supervised fine-tuning (SFT) without this reinitialization might be inadequate in overcoming these local optima and thereby could limit the overall improvement of the model.

tongyx361 commented 2 months ago

Thanks for your clarification!