Open yhyu13 opened 7 months ago
Seems to a issue related to custom flash attention implementation, since hf transformers already support using flash attention2
https://huggingface.co/docs/transformers/perf_infer_gpu_one
I will make a pr for this. I tested it actually work to just use hf transformer's flash attention
Thanks for your great contribution! We will check the pr.
Hi,
I am intersted in applying toolbech dataset to Yi-6B. https://huggingface.co/chargoddard/Yi-6B-Llama
The training script has been slightly modified:
But it turns out to have error:
Does the flash attention code only adapat to llama2 models but not Yi-6B?
Thanks!