Open zillion-zhao opened 2 weeks ago
If you are certain you are using https://github.com/ContextualAI/gritlm/blob/main/scripts/modeling_mistral_gritlm.py or https://huggingface.co/GritLM/GritLM-7B/blob/main/modeling_gritlm7b.py , then I am not sure what the problem is. Maybe try pip show transformers
and replace the modeling_mistral.py file with one of the correct Python files. Else this seems like a simple issue that can just be solved by debugging with print statements.
Yes, maybe there are some small problems. I try to print the type of the model in the training/model.py:
def encode(self, features):
print(type(self.model))
and it shows: <class 'peft.peft_model.PeftModel'>
Maybe Lora influence the model type? I am not clear about it. Do you train the model in a full fine-tuning manner?
When I remove --lora, it shows CUDA out of memory^ ^. Maybe it is really due to the Lora. Maybe I could use more GPUs, but why the Lora influence the model type?
I see, yes it could be because of Lora. I think that the Peft library wraps the transformer model and this could change the kwargs that are passed through. You may need to change something in https://github.com/huggingface/peft/blob/main/src/peft/tuners/lora/model.py to pass it through.
We do full fine-tuning; I haven't really tried Lora with GRIT.
I see. Thank you for your reply!
Hello!
I meet a problem when I train the model in the unified mode.
First, I would like to share that when I evaluate several models in the artifacts (for example bbcc-mean, cccc-lasttoken, and cccc-wmean), it is also shown that info: TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
To tackle the problem, I deem that only when the model is loaded with the class of MistralForCausalLM in modeling_gritlm7b.py, the is_causal argument is meaningful. Otherwise, it I do not put the modeling_gritlm7b.py in the model directory, the model is loaded as the MistralForCausalLM in the transformers lib, which do not have "is_causal". Besides, I think that the model config file should also be modified by adding: "auto_map": { "AutoModel": "modeling_gritlm7b.MistralModel", "AutoModelForCausalLM": "modeling_gritlm7b.MistralForCausalLM", "AutoModelForSequenceClassification": "modeling_gritlm7b.MistralForSequenceClassification" },
I fix this issue for evaluation by executing the behaviors above and it succeeds. However, I meet the same question when I train the model. I download Mistral-7B, add modeling_gritlm7b.py, and modify the config file. However, it still shows TypeError: MistralForCausalLM.forward() got an unexpected keyword argument 'is_causal'.
I guess maybe the model is not loaded correctly, so I print the type of the model in the run.py after loading the model:
The result is <class 'transformers_modules.Mistral-7B.modeling_gritlm7b.MistralForCausalLM'>, which is correct. So I want to know what is the problem? How can I modify some codes to make it work?
The training command: torchrun --nproc_per_node 1 \ -m training.run \ --output_dir output_dir \ --model_name_or_path ../models/Mistral-7B \ --train_data ../data/unified_data \ --learning_rate 1e-5 \ --num_train_epochs 5 \ --per_device_train_batch_size 5 \ --per_device_generative_bs 1 \ --dataloader_drop_last True \ --normalized True \ --temperature 0.02 \ --query_max_len 32 \ --passage_max_len 128 \ --train_group_size 2 \ --mode unified \ --max_steps 1253 \ --attn cccc \ --overwrite_output_dir \ --lora
Waiting for your kind reply! :)