AnswerDotAI / fsdp_qlora

Training LLMs with QLoRA + FSDP
Apache License 2.0
1.42k stars 188 forks source link

bugs for fine-tune fsdp multinode #26

Open batman-do opened 7 months ago

batman-do commented 7 months ago

image how to fix that

KeremTurgutlu commented 7 months ago

Can you share the training command you used with full arguments, and also provide versions of the following libraries:

accelerate                
bitsandbytes            
datasets                  
hqq                       
hqq-aten              
huggingface-hub 
llama-recipes       
peft                      
safetensors         
tokenizers           
torch                    
transformers       

You are likely using an older version of bitsandbytes, quant_storage arg was introduced here: https://github.com/TimDettmers/bitsandbytes/commit/dcfb6f81433e37a8546f7dab3f648eaf858b29ff.

Try pip install -U bitsandbytes and retry. Also for multi-node training make sure each node has the up-to-date bnb version, ideally using same environment across all.