Open Algabeno opened 1 month ago
Hello, may I ask which dataset you used for fine-tuning? If possible, could you provide me with a link? Thank you very much!
Hello, may I ask which dataset you used for fine-tuning? If possible, could you provide me with a link? Thank you very much!
I used this dataset, but I used some pre-processing operations before training. You need to follow the LLaVA specification.
I tried to force the configuration file of model_base to load the weights of LoRA fine-tuning model to ensure that the same configuration class was used during loading and merging.
lora_cfg_pretrained = AutoConfig.from_pretrained(model_base)
but i get new error:
Traceback (most recent call last):
File "/root/autodl-tmp/Molmo-Finetune-master/src/merge_lora_weights.py", line 27, in
Thank you very much for your help. I hope the author can reply to you as soon as possible. I am a beginner and may not be able to provide assistance with your question. Sorry.
@Algabeno Sorry I've have an resource issue for now that I can't properly debug the problem. I'll look into it.
Okay I found out some other problem that Molmo has a different name for lm_head
in other models.
Its' ff_out
. I'll change some other features and check if the proper config is saved.
@Algabeno Also, can you share the config file that you've made with the code please.
@Algabeno Also, can you share the config file that you've made with the code please.
Thank you very much for your reply. Do you mean the config.json or adapter_config.json I generated after fine-tuning? adapter_config.json config.json
@Algabeno Also, can you share the config file that you've made with the code please.
Thank you very much for your reply. Do you mean the config.json or adapter_config.json I generated after fine-tuning? adapter_config.json config.json
Sorry, I am a beginner. Do I need to add these configuration files myself in the original code? If you want to add it, where should you add it.
@Algabeno Also, can you share the config file that you've made with the code please.
Thank you very much for your reply. Do you mean the config.json or adapter_config.json I generated after fine-tuning? adapter_config.json config.json
Sorry, I am a beginner. Do I need to add these configuration files myself in the original code? If you want to add it, where should you add it.
Both files will be automatically generated after you finish finetune.
@Algabeno Also, can you share the config file that you've made with the code please.
Thank you very much for your reply. Do you mean the config.json or adapter_config.json I generated after fine-tuning? adapter_config.json config.json
Sorry, I am a beginner. Do I need to add these configuration files myself in the original code? If you want to add it, where should you add it.
Both files will be automatically generated after you finish finetune.
Thank you very much for your patience!
@Algabeno Sorry I've have an resource issue for now that I can't properly debug the problem. I'll look into it.
Okay I found out some other problem that Molmo has a different name for
lm_head
in other models. Its'ff_out
. I'll change some other features and check if the proper config is saved.
Thank you for reminding me. I have made the following changes to the code now, and I can run the code correctly. ` token_num, token_dim = ff_out_layer.out_features, ff_out_layer.in_features
if ff_out_layer.weight.shape[0] != token_num:
print(f"Resizing ff_out.weight from {ff_out_layer.weight.shape} to ({token_num}, {token_dim})")
ff_out_layer.weight = torch.nn.Parameter(
torch.empty(token_num, token_dim, device=model.device, dtype=model.dtype)
)
print(f"Resizing wte.weight from {wte_layer.weight.shape} to ({token_num}, {token_dim})")
wte_layer.weight = torch.nn.Parameter(
torch.empty(token_num, token_dim, device=model.device, dtype=model.dtype)
)
`
@Algabeno Sorry I've have an resource issue for now that I can't properly debug the problem. I'll look into it. Okay I found out some other problem that Molmo has a different name for
lm_head
in other models. Its'ff_out
. I'll change some other features and check if the proper config is saved.Thank you for reminding me. I have made the following changes to the code now, and I can run the code correctly. `
token_num, token_dim = ff_out_layer.out_features, ff_out_layer.in_features if ff_out_layer.weight.shape[0] != token_num: print(f"Resizing ff_out.weight from {ff_out_layer.weight.shape} to ({token_num}, {token_dim})") ff_out_layer.weight = torch.nn.Parameter( torch.empty(token_num, token_dim, device=model.device, dtype=model.dtype) ) print(f"Resizing wte.weight from {wte_layer.weight.shape} to ({token_num}, {token_dim})") wte_layer.weight = torch.nn.Parameter( torch.empty(token_num, token_dim, device=model.device, dtype=model.dtype) ) `
@Algabeno It seems Molmo can't bring the config from a local path. I'll try to fix this. I'm not sure but I think that could break the model. I'll try to make a better code for LoRA. Since the model is a preview version, these kind of issue really helps.
Thank you!
Thank you for your patient responses. After fine-tuning the model with my dataset, I found that the accuracy is relatively low. Could you give me some suggestions?
The dataset is a target detection dataset. The script I used is finetune_lora_vision.sh
, and the script parameters are as follows:
deepspeed src/training/train.py \
--lora_enable True \
--vision_lora True \
--lora_rank 64 \
--lora_alpha 128 \
--lora_dropout 0.05 \
--num_lora_modules -1 \
--deepspeed scripts/zero3.json \
--model_id $MODEL_NAME \
--data_path ./llava_annotations.json \
--image_folder ./arxiv_paper_images \
--freeze_vision_tower False \
--freeze_llm False \
--tune_projector True \
--bf16 True \
--fp16 False \
--disable_flash_attn2 False \
--output_dir output/lora_vision_test \
--num_train_epochs 2 \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 1 \
--learning_rate 2e-4 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--gradient_checkpointing False \
--report_to tensorboard \
--lazy_preprocess True \
--save_strategy "steps" \
--save_steps 1400 \
--save_total_limit 10 \
--dataloader_num_workers 4
The dataset structure is as follows:
{
"id": "1",
"image": "arxiv_2305_02412_0.png",
"from": "human",
"value": "<image> What are the main elements visible in this image?",
"from_gpt": "The image contains the following elements: - 1 ignored region(s) located at: [[857.65, 564.3, 567.8, 413.6]]",
"from_human": "What is the structure of the table in the image?",
"from_gpt": "There is no table present in the image.",
"from_human": "What information can be extracted from the figure in the image?",
"from_gpt": "There is no figure present in the image."
}
This format enhances readability, especially when posted in a GitHub conversation. It includes code blocks for scripts and JSON data, making the structure clear and easy to follow.
@Algabeno I think it could be, re-initializing the weight when merging. I'll try to find out for the proper way. Another thing that is, when using fp16 or bf16, the result changes comapring with fp32.
I think the model should be a bit more stable. Also, my code too. I'll try to develop my code for using now.
@Algabeno Can you load the model when merging without using a config file?
model = AutoModelForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, trust_remote_code=True, **kwargs)
I think it's gonna work.
@Algabeno Can you load the model when merging without using a config file?
model = AutoModelForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, trust_remote_code=True, **kwargs)
I think it's gonna work.
@2U1 "Yes, the code runs and makes predictions normally, but it still doesn't achieve the results I am aiming for. Have you ever tried adding some kind of vision dataset for fine-tuning?"
@Algabeno I tried to but at that time, it only supported fp32. So I couldn't test it with my full dataset. And for now, I have a issue with my resource so need to try a week later.
Is result worse than the original model?
Also when you converted to the llava_type of data, did you erase <image>
or added \n
after the <image>
token?
@2U1 "Today I tried different precisions like FP16, FP32, as well as different prompts and epochs to fine-tune the model, but the prediction results were very poor. I kept the
@Algabeno I tried to but at that time, it only supported fp32. So I couldn't test it with my full dataset. And for now, I have a issue with my resource so need to try a week later.
Is result worse than the original model?
Also when you converted to the llava_type of data, did you erase
<image>
or added\n
after the<image>
token?
@Algabeno The molmo's input does not need the <image>
token. So I made a code to remove it (I made the dataset example for compatible with llava dataset. It's for easy use.).
However I only filter the exact pattern <image>\n
. So, if you have not add \n
or did not erase the <image>
token. It will be passed as a input.
That could disrupt the model's performance I think. Also, as soon as my issue of resource is solved I'll test the model with my dataset too.
@Algabeno The molmo's input does not need the
<image>
token. So I made a code to remove it (I made the dataset example for compatible with llava dataset. It's for easy use.). However I only filter the exact pattern<image>\n
. So, if you have not add\n
or did not erase the<image>
token. It will be passed as a input. That could disrupt the model's performance I think. Also, as soon as my issue of resource is solved I'll test the model with my dataset too.
@2U1
Thank you for your patient response. After removing
@Algabeno Actually I was saying you should add '\n' to the <image>
token when converting to llava style.
If it dosen't improve the performance of the model, trying full fine tuning with offload can help maybe.
@Algabeno Actually I was saying you should add '\n' to the
<image>
token when converting to llava style. If it dosen't improve the performance of the model, trying full fine tuning with offload can help maybe.
@2U1 "I misunderstood your meaning, but I have already tried following the LLaVA specification by adding '\n' to the
@Algabeno Thanks. I'll try to figure out about the reason.
@Algabeno I've removed the code for not saving the wte
weight. This might change the result maybe.
@Algabeno I've removed the code for not saving the
wte
weight. This might change the result maybe.
Thank you for your patience. After multiple adjustments to the model yesterday, I found that the results still did not meet my expectations. I achieved ideal results using another model, microsoft/Florence-2-large, and it seems that molmo is not suitable for fine-tuning on specific vision tasks.
@Algabeno Thanks for letting me know. I'll keep check on if something I missed.
@Algabeno Thanks for letting me know. I'll keep check on if something I missed.
@2U1 Thank you for sharing. It seems that the poor performance of the model might be due to the limited size of the dataset I constructed. I will try using larger datasets like ImageNet-22k, Object365, and OpenImages for full fine-tuning.
The following error occurred when I used the merge script.Is this correct approach to run merg lora sh ?
merge_lora(args)
File "/root/autodl-tmp/Molmo-Finetune-master/src/merge_lora_weights.py", line 6, in merge_lora
processor, model = load_pretrained_model(model_path=args.model_path, model_base=args.model_base,
File "/root/autodl-tmp/Molmo-Finetune-master/src/utils.py", line 45, in load_pretrained_model
model = AutoModelForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, trust_remote_code=True, **kwargs)
File "/root/miniconda3/envs/molmo/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 557, in from_pretrained
cls.register(config.class, model_class, exist_ok=True)
File "/root/miniconda3/envs/molmo/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 584, in register
raise ValueError(
ValueError: The model class you are passing has a
python src/merge_lora_weights.py \ --model-path ./output/lora_vision_test\ --model-base $MODEL_NAME \ --save-model-path ./Molmo-7B-D-1009 \
Traceback (most recent call last): File "/root/autodl-tmp/Molmo-Finetune-master/src/merge_lora_weights.py", line 27, inconfig_class
attribute that is not consistent with the config class you passed (model has <class 'transformers_modules.Molmo-7B-D-0924.config_molmo.MolmoConfig'> and you passed <class 'transformers_modules.lora_vision_test.config_molmo.MolmoConfig'>. Fix one of those so they match!