Open ferrazzipietro opened 6 months ago
Hi @ferrazzipietro, we didn’t test 4/8 bit training. Which backbone do you use? If the backbone is not LLaMA, it is better to specify the targert_modules explicitly.
BTW, you can also try to use https://github.com/WhereIsAI/BiLLM. This one supports the latest transformers.
I have tried Llama and Mistral, both resulting in nan
s weights. I've seen the new repo as well, but the issue persists. I will let you know if I'll have the chance to deep into it!
Hi, thanks for sharing the code. I have tryed to use your repo using
bitsandbytes
for model quantization. Unfortunately, the training process does not work: the layers defined inmodelling_llama.py
asdo not get trained, and after finetuning they contain only
nan
values. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?