SFT of tiiuae/falcon-40b

LAION-AI / Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

https://open-assistant.io

Apache License 2.0

37.1k stars 3.24k forks source link

SFT of tiiuae/falcon-40b #3234

Open andreaskoepf opened 1 year ago

andreaskoepf commented 1 year ago

A new LLM with semi-permissive license was released: tiiuae/falcon-40b. It dethroned LLaMA on the HuggingFaceH4/open_llm_leaderboard.

Tasks:

evaluate model (dragan)
adapt our SFT training code (e.g. new tokenizer config, no DeepSpeed Adam)
train first version with highest quality part of OA dataset (as used by QLoRA) (Jordi)

andreaskoepf commented 1 year ago

Eval results by tyu01look promising: https://tju01.github.io/ilm-eval/#?benchmark=lm-evaluation-harness

flozi00 commented 1 year ago

It's on Apache license now 🎉

andreaskoepf commented 1 year ago

First only oasst-top1 SFT result: https://huggingface.co/OpenAssistant/falcon-40b-sft-top1-560 (LoRA version also available, needs to be exported)

nivibilla commented 1 year ago

@andreaskoepf was this a full finetune or the QLoRA method?

nivibilla commented 1 year ago

Also I'm unable to view the wandb log for the finetune. I think it's private.

nivibilla commented 1 year ago

Just tested the model, looks good. But it seems you have inherited an issue from the base falcon. When inferencing over multiple gpus I get gibberish unless I pass use_caching=False in the model.generate function. Not sure why this happens.

andreaskoepf commented 1 year ago

@andreaskoepf was this a full finetune or the QLoRA method?

It was a full finetuning. LoRA runs in in progress.

Also I'm unable to view the wandb log for the finetune. I think it's private.

Training logs of the Falcon models should be public now, please check the model cards on HF.

Just tested the model, looks good. But it seems you have inherited an issue from the base falcon. When inferencing over multiple gpus I get gibberish unless I pass use_caching=False in the model.generate function. Not sure why this happens.

Yes, we didn't change the model beside adding the OA tokens. If the original Falcon model has problems we inherited them. Do you know how to fix it?

nivibilla commented 1 year ago

Thanks so much!

Well so far inferencing normally with device_map=auto doesn't seem to work for me. The answer I've gotten from the original falcon team is to use the huggingface text inference repo. And I've asked a question on there and got my answer. But I haven't tested it yet.

https://github.com/huggingface/text-generation-inference/issues/417#event-9448751799

If that seems to work then I plan to go through the code there and see what's going on and why it doesn't work as intended through device_map=auto

nivibilla commented 1 year ago

If you test it out please let me know how it went.