Yuliang-Liu / Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models
MIT License
1.82k stars 128 forks source link

How to finetune only one subnetwork using Deepspeed + Transformers #89

Closed JasonLeeFdu closed 2 months ago

JasonLeeFdu commented 6 months ago

I have to add some LoRA layers by hand(without left) to monkey.visual Model, to finetune the model for new data. I want Deepspeed to optimize ONLY the parameters from the LoRA layer rather than all the parameters. Like this image

The platform is hugging face's transformers and Deepspeed.

Therefore I decorate the Trainer from HF's transformers, as below: image

Unfortunately, it doesn't work, both LoRA and non-LoRa's weights are not changed during training. It seems that the optimizer in Deepspeed is not the same as that from Pytorch.

My question is, are there any ways that allow me to ONLY finetune certain subnet (LoRA) parameters with Deepspeed+Transformer's Trainer?

echo840 commented 2 months ago

Hello, in our training of the monkey and monkey-chat models, we only added LoRA to the ViT part, and the LLM was fully trained. You can add similar code to line 368 of finetune_multitask.py to set requiresgrad to False for parameters you don't need to train, and True for those you do. Please make sure to identify the names of the parameters you don't need to train.

for k,v in model.named_parameters():
    if "lora" in k :#replace "lora" with the names of the parameters you need to train
        v.requires_grad_(True)