argilla-io / notus

Notus is a collection of fine-tuned LLMs using SFT, DPO, SFT+DPO, and/or any other RLHF techniques, while always keeping a data-first approach
MIT License
161 stars 14 forks source link

Adapt SFT fine-tuning script and configuration #8

Closed alvarobartt closed 11 months ago

alvarobartt commented 11 months ago

Description

This PR adds the adapted run_sft.py from huggingface/alignment-handbook to work with our dataset instead of HuggingFace H4's one, similarly to the one already adapted for DPO.

This PR also updates the configuration files needed, and the SFT fine-tuning has already been triggered, in this case we keep the chosen_response as the response to use for SFT, while the rest is discarded.

More details coming soon!

alvarobartt commented 11 months ago

The run in Weights and Biases still seems to crash at the end even though everything's properly uploaded to the Hub, but I think it's related to the hub_strategy: every_save in combination with save_steps: 500, starting an async process that pushes the model into the Hub, and then DeepSpeed / accelerate throwing a timeout :/