ShihaoZhaoZSH / LaVi-Bridge

[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
MIT License
308 stars 20 forks source link

only use lora on unet or pixart? #7

Open CS123n opened 6 months ago

CS123n commented 6 months ago

Hey, did you get the results from fine-tuning only the UNet part with a fixed T5 or Llama?

ShihaoZhaoZSH commented 6 months ago

We have not yet conducted the experiment you mentioned. However, we can provide some insights:

  1. In our research paper, specifically in Section 4.4 (Experiments - Ablation Study), we conducted experiments where we trained the LaVi-Bridge by only fine-tuning the adapter without LoRA. This aligns with your suggestion, with the difference being that we disabled LoRA in both U-Net and LLM. We observed that even when only training the adapter without injecting LoRA, we achieved reasonable results. However, the performance did show a decline.

  2. Additionally, during inference, it is possible to disable LoRA in U-Net or LLM, even when training with both LoRA and adapter. This can be achieved by simply commenting the monkeypatch_or_replace_lora_extended function in the test scripts. Despite this, it is important to note that the performance will also be negatively affected.