IVGSZ / Flash-VStream

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
https://invinciblewyq.github.io/vstream-page/
Apache License 2.0
125 stars 7 forks source link

Lora Finetune Warning. autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it #18

Open felmoreno1726 opened 2 weeks ago

felmoreno1726 commented 2 weeks ago

I'm trying to Lora fine-tuning. I have decent results, but I see the following warnings.

/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/nn/modules/module.py:1877: UserWarning: Positional args are being deprecated, use kwargs instead. Refer to https://pytorch.org/docs/master/generated/torch.nn.Module.html#torch.nn.Module.state_dict for details.
  warnings.warn(
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/checkpoint.py:460: UserWarning: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. The default value of use_reentrant will be updated to be False in the future. To maintain current behavior, pass use_reentrant=True. It is recommended that you use use_reentrant=False. Refer to docs for more details on the differences between the two variants.
  warnings.warn(
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/utils/checkpoint.py:90: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
  warnings.warn(
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/autograd/__init__.py:266: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.)
  Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

Any thoughts to why? Should this be concerning? I find this concerning "autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it". Is this expected? For extra information, I don't train the mlp_adapter.

zhang9302002 commented 1 week ago

Sorry, we have not tried any LoRA fine-tuning. Our current codebase does not support LoRA. Maybe you can check the gradient and LoRA settings.

felmoreno1726 commented 1 week ago

Which part of the codebase does not support LoRa? This is based on the LLaVA codebase which does support Lora, so what changes may have been introduced that could have a conflict?

felmoreno1726 commented 6 days ago

Could it be the following code in Trainer is the issue? In the pretraining routine, you set --tune_mlp_adapter (to True). The top block of code executes. --freeze.mm_mlp_adapter is not set in both cases, which makes it default to False. So even though, you don't want to fine-tune the mm_mlp_adapter, the gradients are still set to True during the fine-tuning routine. And this causes the warning?

https://github.com/IVGSZ/Flash-VStream/blob/5c87d63eef8be1d9ee66aaea84f55e8c8b073ac1/flash_vstream/train/train.py#L990C1-L1003C40

    model.config.tune_mm_mlp_adapter = training_args.tune_mm_mlp_adapter = model_args.tune_mm_mlp_adapter
    if model_args.tune_mm_mlp_adapter:
        model.requires_grad_(False)
        for p in model.get_model().mm_projector.parameters():
            p.requires_grad = True
        for p in model.get_model().attention_model.parameters():
            p.requires_grad = True

    model.config.freeze_mm_mlp_adapter = training_args.freeze_mm_mlp_adapter
    if training_args.freeze_mm_mlp_adapter:
        for p in model.get_model().mm_projector.parameters():
            p.requires_grad = False
        for p in model.get_model().attention_model.parameters():
            p.requires_grad = False

I believe I fixed some of the warnings by passing these flags to the Lora training script: --lora_enable True --lora_r 128 --lora_alpha 256 --mm_projector_lr 2e-5

Seems like the projector layer needed it's own learning rate?

Still. This does not fix the Warning: /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/torch/autograd/init.py:266: UserWarning: c10d::broadcast_: an autograd kernel was not registered to the Autograd key(s) but we are trying to backprop through it. This may lead to silently incorrect behavior. This behavior is deprecated and will be removed in a future version of PyTorch. If your operator is differentiable, please ensure you have registered an autograd kernel to the correct Autograd key (e.g. DispatchKey::Autograd, DispatchKey::CompositeImplicitAutograd). If your operator is not differentiable, or to squash this warning and use the previous behavior, please register torch::CppFunction::makeFallthrough() to DispatchKey::Autograd. (Triggered internally at ../torch/csrc/autograd/autograd_not_implemented_fallback.cpp:63.) Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass