MARIO-Math-Reasoning / Super_MARIO

MIT License
179 stars 13 forks source link

training code #8

Closed jordane95 closed 3 months ago

jordane95 commented 3 months ago

Will the model fine-tuning code also be released soon? Thanks.

lovecambi commented 3 months ago

Will the model fine-tuning code also be released soon? Thanks.

Sorry. Currently, we cannot release the training code. But it's not difficult to implement. Basically, only two modifications are required.

  1. Add a value head to the last layer of LLM.
  2. auto-regressive loss (positive example) + 0.1 * MSE value loss (positive and negative examples)
lovecambi commented 3 months ago

Another hint, we use trl to support value model training.