New Alignment Algorithm: SPPO - Githubissues

axolotl-ai-cloud / axolotl

Go ahead and axolotl questions

https://axolotl-ai-cloud.github.io/axolotl/

Apache License 2.0

7.48k stars 808 forks source link

New Alignment Algorithm: SPPO #1736

Open kaykyr opened 1 month ago

kaykyr commented 1 month ago

⚠️ Please check that this feature request hasn't been suggested before.

[X] I searched previous Ideas in Discussions didn't find any similar feature requests.
[X] I searched previous Issues didn't find any similar feature requests.

🔖 Feature description

Hey guys! For who is interested, I recently submitted a pull request to implements SPPO on Axolotl trainer, you can fallow the pull request here: https://github.com/axolotl-ai-cloud/axolotl/pull/1735

Original SPPO implementation fork: https://github.com/kaykyr/axolotl

See examples/llama3/sppo-qlora-8b.yml config file to see how train SPPO.

✔️ Solution

Check pull request: https://github.com/axolotl-ai-cloud/axolotl/pull/1735

❓ Alternatives

No response

📝 Additional Context

No response

Acknowledgements

[X] My issue title is concise, descriptive, and in title casing.
[X] I have searched the existing issues to make sure this feature has not been requested yet.
[X] I have provided enough information for the maintainers to understand and evaluate this request.