AkariAsai / self-rag

This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi.
https://selfrag.github.io/
MIT License
1.83k stars 171 forks source link

The saved embed_tokens is empty #21

Closed merlinarer closed 11 months ago

merlinarer commented 12 months ago

Hello, I try to run this code with llama1-7B, while I find the saved embed_tokens is empty and fail to load after training. Have you met this problem?

(Pdb) param_name
'model.embed_tokens.weight'
(Pdb) param
tensor([], dtype=torch.bfloat16)
AkariAsai commented 12 months ago

I haven't seen this issue on my side so I am not sure if I can help here... I had some issues when I was using Llama1 and adding special tokens a while ago, as back then (April or May) the HF transformers' llama1 support was somewhat unstable. Probably using Llama2 instead of upgrading HF transformers might help, although I am not sure...

merlinarer commented 11 months ago

Solved! It seems the given scripts will save the spilted safetensor (model-00001-of-00003.safetensors ...), and also a model.safetensors, which leads to the loading error. Deleting model.safetensors solves it.

merlinarer commented 11 months ago

BTW, I can not find the 13B training script. I try to modify the 7B to 13B and find a CUDA OOM even with bs 1 on each 80G device. Maybe I should reduce the input_length ?

AkariAsai commented 11 months ago

Yes we reduce the input_length for 13B due to the OOM issue, as mentioned in the paper. Let me upload the final 13B script and push it.

AkariAsai commented 11 months ago

I uploaded the 13B training script: script_finetune_13b.sh