Should input_embeddings and out_embeddings be updated in Stage2?

eric-ai-lab / MiniGPT-5

Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"

https://eric-ai-lab.github.io/minigpt-5.github.io/

Apache License 2.0

851 stars 52 forks source link

Should input_embeddings and out_embeddings be updated in Stage2? #51

Open Andy1621 opened 4 months ago

Andy1621 commented 4 months ago

Hi! Thanks for your interesting job!

I just find that when using LoRA, the input_embeddings and out_embeddings are not updated with the following code.

https://github.com/eric-ai-lab/MiniGPT-5/blob/2121c745b2cb2d7e842e03b4bcaa89c63f2ee6c1/minigpt4/models/mini_gpt5.py#L115-L116

Considering the LoRA is used for Stage2, does it mean the input_embeddings and out_embeddings are only updated in Stage1? If so, the two lines are redundant since PEFT will set them not updated.

KzZheng commented 3 months ago

Yes, they should be updated. There should be a code for the old PEFT version, where PEFT will have two input embeddings (copied one and original one), and one of them (original one) does not participate in the gradient calculation. If we set both requiring gradients, the DDP will find unused parameters. These two lines are used to avoid this situation. (if I remember correctly)