Open felmoreno1726 opened 2 weeks ago
Sorry, we have not tried any LoRA fine-tuning. Our current codebase does not support LoRA. Maybe you can check the gradient and LoRA settings.
Which part of the codebase does not support LoRa? This is based on the LLaVA codebase which does support Lora, so what changes may have been introduced that could have a conflict?
Could it be the following code in Trainer is the issue? In the pretraining routine, you set --tune_mlp_adapter (to True). The top block of code executes. --freeze.mm_mlp_adapter is not set in both cases, which makes it default to False. So even though, you don't want to fine-tune the mm_mlp_adapter, the gradients are still set to True during the fine-tuning routine. And this causes the warning?
model.config.tune_mm_mlp_adapter = training_args.tune_mm_mlp_adapter = model_args.tune_mm_mlp_adapter
if model_args.tune_mm_mlp_adapter:
model.requires_grad_(False)
for p in model.get_model().mm_projector.parameters():
p.requires_grad = True
for p in model.get_model().attention_model.parameters():
p.requires_grad = True
model.config.freeze_mm_mlp_adapter = training_args.freeze_mm_mlp_adapter
if training_args.freeze_mm_mlp_adapter:
for p in model.get_model().mm_projector.parameters():
p.requires_grad = False
for p in model.get_model().attention_model.parameters():
p.requires_grad = False
I believe I fixed some of the warnings by passing these flags to the Lora training script: --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5
Seems like the projector layer needed it's own learning rate?
Still. This does not fix the Warning: /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/autograd/init.py:266: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
I'm trying to Lora fine-tuning. I have decent results, but I see the following warnings.
Any thoughts to why? Should this be concerning? I find this concerning "autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it". Is this expected? For extra information, I don't train the mlp_adapter.