SivanDoveh / TSVLC

Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
44 stars 5 forks source link

Question about LoRA alpha #4

Closed vishaal27 closed 1 year ago

vishaal27 commented 1 year ago

Hi, thanks for your great work. I noticed that in your scripts, you hard-coded the lora alpha to be 1 for all your LoRA modules (e.g., for multi-head-attention): https://github.com/SivanDoveh/TSVLC/blob/a9c96ab9f635db7aa659505d7c920e1271c1da4b/src/open_clip/loralib/layers.py#L333-L360 However, the lora rank is passed in as parameter for each sub-module while initialising.

Was there a principled justification for this choice? I am just wondering if you did any tuning on these values to suggest what would be good values to use because with the current choice the lora scaling factor ends up being 1/4 = 0.25. I have noticed much higher scaling factors while fine-tuning other LLMs and VLMs and hence was a bit curious. For example, this paper uses an alpha of 128: https://github.com/eric-ai-lab/PEViT/blob/be6fb43ff54adeeffe720c663dd238976070558e/vision_benchmark/evaluation/lora_model.py#L455-L463

leokarlin commented 1 year ago

no, we have not tuned these values, they were taken from defaults of the original LoRA impl I think