Couldnt run the model - CohereForAI/aya-expanse-8b

ArchchanaKugathasan commented 6 days ago

Hi,

I have tried running the CohereForAI/aya-expanse-8b model. I added the following code to your script

---------------------------------CODE CHANGE 1----------------------------------------------------- from transformer_heads.constants import model_type_map, loss_fct_map import torch.nn as nn from transformers import AutoModelForCausalLM

loss_fct_map["nll"] = nn.NLLLoss() model_type_map["auto"] = ("model", AutoModelForCausalLM)

for the above CODE CHANGE 1, I got the following error

------------------------------CODE CHANGE 1 ERROR --------------------

Traceback (most recent call last): File "/vol/research/Archchana/Experiments/regression_head_Mat/exp-4/train_multilingual-GEMBA.py", line 185, in model = create_headed_qlora( File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformer_heads/util/load_model.py", line 256, in create_headed_qlora model: HeadedModel = model.from_pretrained( File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3832, in from_pretrained model = cls(config, *model_args, **model_kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformer_heads/model/model.py", line 699, in init model_type_map[config.model_type][0], KeyError: 'cohere'

So I changed the code to the following ---------------------------------CODE CHANGE 2-----------------------------------------------------

def cohere_model_loader(config): return AutoModelForCausalLM.from_pretrained(config._name_or_path, trust_remote_code=True)

model_type_map["cohere"] = ("model", cohere_model_loader)

when I make this change it shows the following error message.

---------------------------------CODE CHANGE 2 ERROR ----------------------------------------------------- Traceback (most recent call last): File "/vol/research/Archchana/Experiments/regression_head_Mat/exp-4/aya_train_multilingual-GEMBA.py", line 193, in model = create_headed_qlora( File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformer_heads/util/load_model.py", line 268, in create_headed_qlora model = prepare_model_for_kbit_training( File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/peft/utils/other.py", line 116, in prepare_model_for_kbit_training model.enable_input_require_grads() File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1761, in enable_input_require_grads self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py", line 994, in get_input_embeddings return self.model.embed_tokens File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'CohereForCausalLM' object has no attribute 'embed_tokens'

Could you please hep me with this issue.

Thank you!

yannikkellerde commented 6 days ago

I won't have time to maintain strong support for all kinds of non-llama based models. But I'll tell you what I notice from your report:

Instead of code change 2, the canonical way to do this would be: model_type_map["cohere"] = ("model", CohereForCausalLM), importing CohereForCausalLM from transformers. Not sure if your solution should also work.
The error you show after code change 2 happens during prepare_model_for_kbit_training which is a PEFT thing. Are you sure that CohereForCausalLM is compatible with quantization and QLoRA training? At first glance this looks like a PEFT quantization and not transformer-heads problem.

ArchchanaKugathasan commented 5 days ago

Thank you very much for your prompt reply.

I have tried this model with 4 bit quantization and LoRA without regression head just prompting using another script, it worked. Would this be the problem in the transformer_head library?
Also is there any possibility we can use your code without QLoRA? if so what needs to be changed in the script?

yannikkellerde commented 4 days ago

Well, there is https://github.com/center-for-humans-and-machines/transformer-heads/blob/main/notebooks/gpt2/text_classification_full_finetune.ipynb. Haven't tested that in a while though, given that I am rarely in situations where I have enough GPU VRAM to do full finetuning.

ArchchanaKugathasan commented 2 days ago

Thank you :) , I will check this.

ArchchanaKugathasan commented 2 days ago

I have checked whether this model (CohereForAI/aya-expanse-8b) supports QLoRA, and according to the following tutorial, it confirms this model supports QLoRA -[https://youtu.be/ChIxwXCI9aY?si=sq09vRJFkrsx-8El]. Also, I have run a few tests which prove this model supports QLoRA.

So I am wondering what could be possibly causing this error

Traceback (most recent call last): File "/vol/research/Archchana/Experiments/regression_head_Mat/exp-4/aya_train_multilingual-GEMBA.py", line 270, in model = create_headed_qlora( File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformer_heads/util/load_model.py", line 268, in create_headed_qlora model = prepare_model_for_kbit_training( File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/peft/utils/other.py", line 116, in prepare_model_for_kbit_training model.enable_input_require_grads() File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1761, in enable_input_require_grads self._require_grads_hook = self.get_input_embeddings().register_forward_hook(make_inputs_require_grads) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformers/models/cohere/modeling_cohere.py", line 994, in get_input_embeddings return self.model.embed_tokens File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1709, in getattr raise AttributeError(f"'{type(self).name}' object has no attribute '{name}'") AttributeError: 'CohereForCausalLM' object has no attribute 'embed_tokens'

center-for-humans-and-machines / transformer-heads

Couldnt run the model - CohereForAI/aya-expanse-8b #14