training code - Githubissues

MARIO-Math-Reasoning / Super_MARIO

MIT License

179 stars 13 forks source link

Closed jordane95 closed 3 months ago

jordane95 commented 3 months ago

Will the model fine-tuning code also be released soon? Thanks.

lovecambi commented 3 months ago

Will the model fine-tuning code also be released soon? Thanks.

Sorry. Currently, we cannot release the training code. But it's not difficult to implement. Basically, only two modifications are required.

Add a value head to the last layer of LLM.
auto-regressive loss (positive example) + 0.1 * MSE value loss (positive and negative examples)

lovecambi commented 3 months ago

Another hint, we use trl to support value model training.