Open liangyingshao opened 3 months ago
Thanks for your interest! Could you please provide the transformer
, accelerate
, and deepspeed
version? And also, could you please provide the results you got?
Thanks for your interest! Could you please provide the
transformer
,accelerate
, anddeepspeed
version? And also, could you please provide the results you got? I use transformers==4.33.0, accelerate==0.33.0, deepspeed==0.14.4 Here are the results of mine:
It looks like the results you got are very close to the checkpoint we released under the same virtual env. I suspect the main issue could come from the version mismatch.
Please try uninstall transformers
, deepspeed
and accelerate
and reinstall them by
pip install git+https://github.com/fe1ixxu/ALMA.git@alma-r-hf
pip3 install deepspeed==0.13.1
pip install accelerate==0.27.2
pip install peft==0.5.0
Hope this is helpful
It looks like the results you got are very close to the checkpoint we released under the same virtual env. I suspect the main issue could come from the version mismatch.
Please try uninstall
transformers
,deepspeed
andaccelerate
and reinstall them by
pip install git+https://github.com/fe1ixxu/ALMA.git@alma-r-hf
pip3 install deepspeed==0.13.1
pip install accelerate==0.27.2
pip install peft==0.5.0
- Re-evaluate the checkpoint
Hope this is helpful
Thank you for your suggestion! I will try it out and provide feedback in this issue.
It looks like the results you got are very close to the checkpoint we released under the same virtual env. I suspect the main issue could come from the version mismatch.
Please try uninstall
transformers
,deepspeed
andaccelerate
and reinstall them by
pip install git+https://github.com/fe1ixxu/ALMA.git@alma-r-hf
pip3 install deepspeed==0.13.1
pip install accelerate==0.27.2
pip install peft==0.5.0
- Re-evaluate the checkpoint
Hope this is helpful
I try your suggestion, and it does lead to some performance improvement. However, the reproduced performance still falls short of what the paper reports. Could you advise on any other potential factors that might affect the performance? Any further suggestions for improvement would be greatly appreciated.
By the way, could you please provide the versions of the datasets
, tokenizer
, and huggingface-hub
that you are using?
Thank you for your excellent work.
While fine-tuning the ALMA-7b-Pretrain model and testing with the checkpoint you provided, I was unable to reproduce the performance of ALMA-7b-LoRA as described in the paper. I would appreciate any guidance or suggestions you could offer. I used the code, data, and scripts provided in this repository (including runs/parallel_ft_lora.sh and evals/alma_7b_lora.sh), with a training batch size of 256 and four V100 GPUs.
Please feel free to ask if you need more details about my experiments.