Open simoneriggi opened 1 month ago
If you are using LoRA fine-tuning and the path string where you save the model does not include 'lora,' you can try renaming your model save path to add '_lora.'
Dear @jiajunlong, thanks for the reply. Yes, I use lora fine tuning in both the first and second epoch model. I have added '_lora.' in the model path directory, but I get the same error.
My model path is now:
/scratch/riggi/Analysis/MLProjects/TinyLLaVA/fine-tuning/radioimg-dataset/TinyLLaVA-Phi-2-SigLIP-3.1B/vision_freeze/nepochs2/_lora.
and contains the following files:
-rw-rw-r-- 1 riggi riggi 5111 Aug 7 21:37 README.md
-rw-rw-r-- 1 riggi riggi 38185 Aug 7 21:37 adapter_config.json
-rw-rw-r-- 1 riggi riggi 1145288440 Aug 7 21:37 adapter_model.safetensors
-rw-rw-r-- 1 riggi riggi 1080 Aug 7 21:37 added_tokens.json
-rw-rw-r-- 1 riggi riggi 2387 Aug 7 21:37 config.json
drwxrwxr-x 2 riggi riggi 4096 Aug 7 21:37 connector
drwxrwxr-x 2 riggi riggi 4096 Aug 7 21:37 language_model
-rw-rw-r-- 1 riggi riggi 599 Aug 7 18:22 log.txt
-rw-rw-r-- 1 riggi riggi 456318 Aug 7 21:37 merges.txt
drwxrwxr-x 3 riggi riggi 4096 Aug 7 18:22 runs
-rw-rw-r-- 1 riggi riggi 587 Aug 7 21:37 special_tokens_map.json
-rw-rw-r-- 1 riggi riggi 7447 Aug 7 21:37 tokenizer_config.json
-rw-rw-r-- 1 riggi riggi 67295 Aug 7 21:37 trainer_state.json
drwxrwxr-x 2 riggi riggi 4096 Aug 7 21:37 vision_tower
-rw-rw-r-- 1 riggi riggi 999186 Aug 7 21:37 vocab.json
Do you know what I am doing wrong?
Thanks a lot for your help.
Just wanted to add that the adapter_config.json
files produced in the nepoch1 and 2 paths are different.
In the latter, there are layers with subfix ".base_layer" or "lora_A.default".
I attach both.
When you were evaluation, was the model_path in load_pretrained_model set to "/scratch/riggi/Analysis/MLProjects/TinyLLaVA/fine-tuning/radioimg-dataset/TinyLLaVA-Phi-2-SigLIP-3.1B/vision_freeze/nepochs2/_lora"?
Could you please check the function that loads model parameters in the codebase? When loading model weights, it checks if the model path contains the string "lora" to load the weights accordingly. The error you encountered above suggests that you used a non-lora method to load the weights.
Dear @jiajunlong,
yes the model path passed to ~load_pretrained_modelis "/scratch/riggi/Analysis/MLProjects/TinyLLaVA/fine-tuning/radioimg-dataset/TinyLLaVA-Phi-2-SigLIP-3.1B/vision_freeze/nepochs2/_lora". From the logs (attached below) it seems the code is indeed executing the
elif model_name_or_path is not None and 'lora' in model_name_or_path:branch in the codebase. The
Loading LoRA weights...` statement is printed only in that branch.
[2024-08-11 19:51:27,224] [INFO] [real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
2024-08-11 19:51:28 INFO - Get script args ...
2024-08-11 19:51:28 INFO - Load model /scratch/riggi/Analysis/MLProjects/TinyLLaVA/fine-tuning/radioimg-dataset/TinyLLaVA-Phi-2-SigLIP-3.1B/vision_freeze/nepochs2/_lora. ...
/home/riggi/Software/venvs/tinyllava/lib/python3.10/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading LoRA weights...
Traceback (most recent call last):
File "/home/riggi/Analysis/MLProjects/TinyLLaVA/scripts//run_tinyllava_inference.py", line 256, in <module>
sys.exit(main())
File "/home/riggi/Analysis/MLProjects/TinyLLaVA/scripts//run_tinyllava_inference.py", line 180, in main
model, tokenizer, image_processor, context_len = load_pretrained_model(model_path)
File "/home/riggi/Software/Sources/TinyLLaVA_Factory/tinyllava/model/load_model.py", line 55, in load_pretrained_model
model = PeftModel.from_pretrained(model, model_name_or_path)
File "/home/riggi/Software/venvs/tinyllava/lib/python3.10/site-packages/peft/peft_model.py", line 355, in from_pretrained
model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type](model, config, adapter_name)
File "/home/riggi/Software/venvs/tinyllava/lib/python3.10/site-packages/peft/peft_model.py", line 1094, in __init__
super().__init__(model, peft_config, adapter_name)
File "/home/riggi/Software/venvs/tinyllava/lib/python3.10/site-packages/peft/peft_model.py", line 129, in __init__
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
File "/home/riggi/Software/venvs/tinyllava/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 136, in __init__
super().__init__(model, config, adapter_name)
File "/home/riggi/Software/venvs/tinyllava/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 148, in __init__
self.inject_adapter(self.model, adapter_name)
File "/home/riggi/Software/venvs/tinyllava/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 328, in inject_adapter
raise ValueError(...
In the meanwhile, I also tried a different approach (not sure if correct though...).
I have taken the model saved after the 1st epoch, loaded it with the lora
branch code and saved it as a plain new model:
# - Load model
# NB: this returns the PeftModel after merging LORA weights
model, tokenizer, image_processor, context_len = load_pretrained_model(model_path) # this method executes the LORA elif branch
# - Save model & tokenizer
model.save_pretrained(model_out_path)
tokenizer.save_pretrained(model_out_path)
The above code produces these files:
-rw-rw-r-- 1 riggi riggi 119 Aug 9 13:08 generation_config.json
-rw-rw-r-- 1 riggi riggi 2387 Aug 9 13:08 config.json
-rw-rw-r-- 1 riggi riggi 4995590776 Aug 9 13:08 model-00001-of-00002.safetensors
-rw-rw-r-- 1 riggi riggi 1439367824 Aug 9 13:08 model-00002-of-00002.safetensors
-rw-rw-r-- 1 riggi riggi 7447 Aug 9 13:08 tokenizer_config.json
-rw-rw-r-- 1 riggi riggi 473 Aug 9 13:08 special_tokens_map.json
-rw-rw-r-- 1 riggi riggi 96523 Aug 9 13:08 model.safetensors.index.json
-rw-rw-r-- 1 riggi riggi 1080 Aug 9 13:08 added_tokens.json
-rw-rw-r-- 1 riggi riggi 999186 Aug 9 13:08 vocab.json
-rw-rw-r-- 1 riggi riggi 456318 Aug 9 13:08 merges.txt
Then, I trained from this saved model for 1 epoch using the same custom_finetune script. I only changed use_fast=True
in the AutoTokenizer.from_pretrained
method to fix an error.
Finally, I run the inference on the saved model at epoch 2, and I did not get the previous error.
Now, to be honest, I am not sure at all if this is the correct approach to continue fine-tuning. It seems that every LORA fine-tuning adds an adapted layer to the saved model, that's why I merged first the LORA weights in the base model before continuing the training. Another (2nd-order) problem) is that I also need to somehow adjust the learning rate of the warmup+cosine strategy when continuing training, otherwise the schedule starts from scratch and not from the end of 1st epoch strategy.
Please, let me know if I am doing it the wrong way.
Thanks a lot for your time.
Hello, I would like to ask how you loaded this model TinyLLaVA-Phi-2-SigLIP-3.1B . Why do I download it locally, but it keeps showing that I need to download it from hf? I look forward to your reply very much. Thank you.
@eva10084 Initially, I loaded the model from huggingface tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B
as:
deepspeed custom_finetune.py \
--pretrained_model_path tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B
--output_dir $SAVED_MODEL_PATH
[OTHER OPTIONS]
The model files for tinyllava/TinyLLaVA-Phi-2-SigLIP-3.1B
are automatically downloaded if not present in the disk and stored in $HF_HOME/hub directory, that is used as cache in future runs.
$SAVED_MODEL_PATH is the path where I want the trained model to be saved. Then, after completing the training, I continued the training by loading the model from the $SAVED_MODEL_PATH, e.g.
deepspeed custom_finetune.py \
--pretrained_model_path $SAVED_MODEL_PATH
[OTHER OPTIONS]
Hope that helps.
Dear all, I have fine-tuned TinyLLaVA-Phi-2-SigLIP-3.1B for 1 epoch and then continued the fine-tuning for another epoch starting from trained models saved after the first epoch. Both training runs were successful. For that runs I used the
custom_finetune.sh
script with provided default parameters.The evaluation runs fine for the first model (epoch1) but fails for the final model (epoch2) with this error:
It seems that the second model is saved without some components or with different layer names. Any hint to solve this error?
Thanks a lot for your help.
PS: To do evaluation I am using the sample code shown in this previous issue: https://github.com/TinyLLaVA/TinyLLaVA_Factory/issues/79