NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment
Apache License 2.0
452 stars 48 forks source link

Add the SFT tutorial #59

Open shengyangs opened 7 months ago

shengyangs commented 7 months ago

Is your feature request related to a problem? Please describe.

We should include a tutorial for the SFT. Although we have SteerLM, including a SFT tutorial is important because it is the simplest technique for a user to get started. It is also prerequisite of RLHF and DPO.

shengyangs commented 7 months ago

@gshennvm @odelalleau How do you think?

odelalleau commented 7 months ago

@gshennvm @odelalleau How do you think?

I agree we should have one.

gshennvm commented 7 months ago

there is a SFT section in the RLHF tutorial, we can pull that out -- will that do?

gshennvm commented 7 months ago

see: https://github.com/NVIDIA/NeMo-Aligner/blob/main/docs/user-guide/RLHF.rst?plain=1#L51

gshennvm commented 7 months ago

also @shengyangs do you happen to have a better dataset/script for the SFT tutorial? I wonder if we should update to something other than dolly

shengyangs commented 7 months ago

@gshennvm You are right. The SFT is here already. I think we probably want to pull it out in a separate section. I missed it in the first read. I was talking with some people, and they are interested in trying out SFT due to its simplicity.

shengyangs commented 7 months ago

The current dolly dataset is fine to me as a prompt-response example. Maybe we should add another example with a chat dataset, for it I have been playing with Ultrachat. I am not sure if there are simpler toy datasets.