Closed merlinarer closed 11 months ago
I haven't seen this issue on my side so I am not sure if I can help here... I had some issues when I was using Llama1 and adding special tokens a while ago, as back then (April or May) the HF transformers' llama1 support was somewhat unstable. Probably using Llama2 instead of upgrading HF transformers might help, although I am not sure...
Solved! It seems the given scripts will save the spilted safetensor (model-00001-of-00003.safetensors ...), and also a model.safetensors, which leads to the loading error. Deleting model.safetensors solves it.
BTW, I can not find the 13B training script. I try to modify the 7B to 13B and find a CUDA OOM even with bs 1 on each 80G device. Maybe I should reduce the input_length ?
Yes we reduce the input_length for 13B due to the OOM issue, as mentioned in the paper. Let me upload the final 13B script and push it.
I uploaded the 13B training script: script_finetune_13b.sh
Hello, I try to run this code with llama1-7B, while I find the saved embed_tokens is empty and fail to load after training. Have you met this problem?