4AI / LS-LLaMA

A Simple but Powerful SOTA NER Model | Official Code For Label Supervised LLaMA Finetuning
https://arxiv.org/abs/2310.01208
MIT License
141 stars 24 forks source link

Bitsandbytes quantization extension #19

Open ferrazzipietro opened 6 months ago

ferrazzipietro commented 6 months ago

Hi, thanks for sharing the code. I have tryed to use your repo using bitsandbytes for model quantization. Unfortunately, the training process does not work: the layers defined in modelling_llama.py as

        self.dropout = nn.Dropout(classifier_dropout)
        self.classifier = nn.Linear(config.hidden_size, config.num_labels)

do not get trained, and after finetuning they contain only nanvalues. I guess it is a data type conflict, as the hidden layers are loaded in 4/8 bits, while the classifier is still saved in memory as float16... Any clue/plan on how to fix that?

SeanLee97 commented 6 months ago

Hi @ferrazzipietro, we didn’t test 4/8 bit training. Which backbone do you use? If the backbone is not LLaMA, it is better to specify the targert_modules explicitly.

SeanLee97 commented 6 months ago

BTW, you can also try to use https://github.com/WhereIsAI/BiLLM. This one supports the latest transformers.

ferrazzipietro commented 6 months ago

I have tried Llama and Mistral, both resulting in nans weights. I've seen the new repo as well, but the issue persists. I will let you know if I'll have the chance to deep into it!