center-for-humans-and-machines / transformer-heads

Toolkit for attaching, training, saving and loading of new heads for transformer models
https://transformer-heads.readthedocs.io/en/latest/
MIT License
236 stars 21 forks source link

Unable to run model meta-llama/Meta-Llama-3.1-8B-Instruct #8

Closed ArchchanaKugathasan closed 4 days ago

ArchchanaKugathasan commented 1 week ago

The code is running perfectly with all the other models. But there are issues when running the meta-llama/Meta-Llama-3.1-8B-Instruct.

I tried adding the following code part as instructed, but it is still not working.

model_type_map["meta-llama3.1-8B"] = ("model", LlamaForCausalLM)

Initially, it showed the following error,

Original Traceback (most recent call last): File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in _worker output = module(*input, kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(args, kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/peft/peft_model.py", line 1129, in forward return self.base_model( File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 161, in forward return self.model.forward(args, kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward output = module._old_forward(*args, **kwargs) File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformer_heads/model/model.py", line 148, in forward sequence_lengths = torch.eq(input_ids, pad_tk_id).int().argmax(-1) - 1 TypeError: eq() received an invalid combination of arguments - got (Tensor, list), but expected one of:

When I fixed the above error in the model.py another error occurred

Traceback (most recent call last): File "/vol/research/Archchana/Experiments/regression_head_Mat/exp-4/train_multilingual_llama-3.1.py", line 204, in model = create_headed_qlora( File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/transformer_heads/util/load_model.py", line 256, in create_headed_qlora model: HeadedModel = model.from_pretrained( File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3913, in from_pretrained tied_params = find_tied_parameters(model) File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 708, in find_tied_parameters all_named_parameters = {name: param for name, param in _get_named_parameters(model, remove_duplicate=False)} File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 708, in all_named_parameters = {name: param for name, param in _get_named_parameters(model, remove_duplicate=False)} File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 667, in _get_named_parameters members = module._parameters.items() AttributeError: 'NoneType' object has no attribute '_parameters'

yannikkellerde commented 1 week ago

Adding to model_type_map should not be nescessary for any llama type model. The model_type_map considers the model_type attribute of the config which is llama in the case of llama 3.1 instruct.

To understand what is going on, I'd need some code or at least your HeadConfig with which the error occured. Did you try some of the example code with meta-llama/Meta-Llama-3.1-8B-Instruct? (e.g. modify the model_path in linear_probe.ipynb)

ArchchanaKugathasan commented 4 days ago

THANK YOU for your reply. Please find the head config code below, I use it for a regression task.

head_configs = [ HeadConfig( name="mean_regression", layer_hook=-5, in_size=hidden_size, output_activation="linear", is_causal_lm=False, pred_for_sequence=True, loss_fct="mse", num_outputs=1,
is_regression=True, loss_weight=0.002, ), ]

Also, I have tried one of your sample codes 'joint_multitask_learning', and for this also I get the same error

Traceback (most recent call last): File "/vol/research/Archchana/Experiments/transformer_heads/joint_multitask_learning.py", line 124, in model = create_headed_qlora( File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/transformer_heads/util/load_model.py", line 256, in create_headed_qlora model: HeadedModel = model.from_pretrained( File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3913, in from_pretrained tied_params = find_tied_parameters(model) File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 708, in find_tied_parameters all_named_parameters = {name: param for name, param in _get_named_parameters(model, remove_duplicate=False)} File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 708, in all_named_parameters = {name: param for name, param in _get_named_parameters(model, remove_duplicate=False)} File "/vol/research/Archchana/Anaconda3/envs/transhead_llama3/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 667, in _get_named_parameters members = module._parameters.items() AttributeError: 'NoneType' object has no attribute '_parameters'

Could you please help me with this issue?

yannikkellerde commented 4 days ago

Hi, I was able to reproduce the error you shared in the beginning:

File "/vol/research/Archchana/Anaconda3/envs/transhead2/lib/python3.10/site-packages/transformer_heads/model/model.py", line 148, in forward
sequence_lengths = torch.eq(input_ids, pad_tk_id).int().argmax(-1) - 1
TypeError: eq() received an invalid combination of arguments - got (Tensor, list), but expected one of:

The reason for this error is that my code did not expect the eos_token_id to be a list (as this has not been the case for models before llama 3.1). As seen in this config they changed this recently.

I fixed that bug now in commit 1603029b65f53fd161e0f49e8ea8ba0750f96d21 and was able to sucessfully start training on the joint_multitask_learning notebook with meta-llama/Meta-Llama-3.1-8B-Instruct.

I did not get the AttributeError: 'NoneType' object has no attribute '_parameters' error ever. Maybe it is related to your attempt of fixing model.py?

Anyway, try updating to the newest version and tell me if that fixes it.

ArchchanaKugathasan commented 4 days ago

Thanks a lot for the quick response, This fix works fine! :) I am able to run my code.