Closed vishaal27 closed 1 year ago
Hi, thanks for your great work. I noticed that in your scripts, you hard-coded the lora alpha to be 1 for all your LoRA modules (e.g., for multi-head-attention): https://github.com/SivanDoveh/TSVLC/blob/a9c96ab9f635db7aa659505d7c920e1271c1da4b/src/open_clip/loralib/layers.py#L333-L360 However, the lora rank is passed in as parameter for each sub-module while initialising.
Was there a principled justification for this choice? I am just wondering if you did any tuning on these values to suggest what would be good values to use because with the current choice the lora scaling factor ends up being 1/4 = 0.25. I have noticed much higher scaling factors while fine-tuning other LLMs and VLMs and hence was a bit curious. For example, this paper uses an alpha of 128: https://github.com/eric-ai-lab/PEViT/blob/be6fb43ff54adeeffe720c663dd238976070558e/vision_benchmark/evaluation/lora_model.py#L455-L463
no, we have not tuned these values, they were taken from defaults of the original LoRA impl I think
Hi, thanks for your great work. I noticed that in your scripts, you hard-coded the lora alpha to be 1 for all your LoRA modules (e.g., for multi-head-attention): https://github.com/SivanDoveh/TSVLC/blob/a9c96ab9f635db7aa659505d7c920e1271c1da4b/src/open_clip/loralib/layers.py#L333-L360 However, the lora rank is passed in as parameter for each sub-module while initialising.
Was there a principled justification for this choice? I am just wondering if you did any tuning on these values to suggest what would be good values to use because with the current choice the lora scaling factor ends up being 1/4 = 0.25. I have noticed much higher scaling factors while fine-tuning other LLMs and VLMs and hence was a bit curious. For example, this paper uses an alpha of 128: https://github.com/eric-ai-lab/PEViT/blob/be6fb43ff54adeeffe720c663dd238976070558e/vision_benchmark/evaluation/lora_model.py#L455-L463