axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.48k stars 808 forks source link

New Alignment Algorithm: SPPO #1736

Open kaykyr opened 1 month ago

kaykyr commented 1 month ago

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

Hey guys! For who is interested, I recently submitted a pull request to implements SPPO on Axolotl trainer, you can fallow the pull request here: https://github.com/axolotl-ai-cloud/axolotl/pull/1735

Original SPPO implementation fork: https://github.com/kaykyr/axolotl

See examples/llama3/sppo-qlora-8b.yml config file to see how train SPPO.

✔️ Solution

Check pull request: https://github.com/axolotl-ai-cloud/axolotl/pull/1735

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements