TencentQQGYLab / ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
https://ella-diffusion.github.io/
Apache License 2.0
1.09k stars 57 forks source link

similar to LaVi-Bridge, any connection? #13

Open dandincyf opened 8 months ago

dandincyf commented 8 months ago

i just find a recent paper https://arxiv.org/abs/2403.07860 with a similar topic, any difference or connection?

budui commented 8 months ago

LaVi-Bridge and ELLA are independent works completed at the same time, and there was no communication between us.

ELLA focuses on the exploration of connectors between LLM and UNet, while LaVi-Bridge has tried many LLM and visual models. These two papers can learn from each other. LaVi-Bridge proved that LLaMA+LoRA has better performance than T5, and we are considering following their ideas. However, LLaMA+LoRA means that the gradient needs to go through LLM and consumes more GPU memory for training.

Both ELLA and LaVi-Bridge report T2I-Benchmark scores, allowing for some simple comparisons. We will open source ELLA (SD1.5) as soon as possible for the community to conduct some in-depth analysis and comparison.

scarbain commented 8 months ago

@budui Aren't you going to opensource ELLA for SDXL too? I thought you also had a working version for SDXL

Bionagato commented 8 months ago

Please release ELLA for SDXL. At this point, nobody uses SD1.5, so the community will likely not show interest if it's based on it. We already have SD 2.x, SDXL, Stable Cascade, and soon SD3.

Manni1000 commented 8 months ago

i think 1.5 is also cool but sdxl too

melohux commented 8 months ago

We greatly appreciate your interest in ELLA_sdxl. However, the process of open-sourcing ELLA_sdxl requires an extensive review by our senior leadership. This procedure can be considerably time-consuming. Conversely, ELLA_sdv1.5, which is more research-oriented, can be released promptly. We would appreciate your patience and understanding about this.

victorca25 commented 8 months ago

At this point, nobody uses SD1.5

This is a false statement, most people still use SD1.5, because it has the largest amount of LoRAs available and requires the lowest amount of resources for inference.

Bionagato commented 8 months ago

@victorca25 1.5 is old, people still using SD1.5 in 2024 do so because they lack the resources to run XL. The only exception to this was anime, but now there are Animagine and Pony. If someone doesn't have the resources for SDXL, they probably won't for SD1.5 + LLM.

victorca25 commented 8 months ago

@victorca25 1.5 is old

New 1.5 fine tunes come out every day, so this is another false statement.

people still using SD1.5 in 2024 do so because they lack the resources to run XL.

Or they do not see the point of needing more VRAM and waiting longer for marginally better results.

If someone doesn't have the resources for SDXL, they probably won't for SD1.5 + LLM.

You're just assuming, you don't know. But ELLA + SD1.5 models has the potential to generate better results than SDXL, which will now be superseded by 3.0, so by your own logic, why waste time with SDXL? :D

scarbain commented 8 months ago

I can confirm SD1.5 is still used in professional prod environments, of course. Which doesn't reduce the necessity to opensource ELLA for SDXL, that would still be awesome :)

moesie commented 8 months ago

In the end it is still the privilege of the Tencent management to decide if it suits their company goals better to open source whatever code their employees are working on or to keep it proprietary. To discuss this here is kind of pointless and OT. This thread is about the similarities and differences between LaVi-Bridge and ELLA.

zethfoxster commented 7 months ago

e largest amount of LoRAs available and requires the lowest amount of resources for inference.

im on my 4090 running 1.5SD...

rundiffusion commented 6 months ago

LaVi-Bridge and ELLA are independent works completed at the same time, and there was no communication between us.

ELLA focuses on the exploration of connectors between LLM and UNet, while LaVi-Bridge has tried many LLM and visual models. These two papers can learn from each other. LaVi-Bridge proved that LLaMA+LoRA has better performance than T5, and we are considering following their ideas. However, LLaMA+LoRA means that the gradient needs to go through LLM and consumes more GPU memory for training.

Both ELLA and LaVi-Bridge report T2I-Benchmark scores, allowing for some simple comparisons. We will open source ELLA (SD1.5) as soon as possible for the community to conduct some in-depth analysis and comparison.

We are RunDiffusion. We build Juggernaut XL. The worlds most downloaded SDXL model. We would be very interested in getting an aligned version of our model. Have you thought about how that would work? Licensing, alignment service? Etc Would you allow a model to be aligned and released publicly? If not, we would still be interested in a version for our business use and clients. We are in communication with Nvidia about their DRaFT+ approach and are experimenting in that area.

We would love to talk about your SDXL plans.

Please let us know!

budui commented 6 months ago

Thank you for your interest in our work. Juggernaut XL is a fantastic model. Without any adjustments, Juggernaut XL+ELLA works very well. In fact, not only Juggernaut XL, ELLA can be easily integrated with CLIP-fixed (which means during finetuning, text encoder is fixed) SD derivative models for use, including even video generation models like AnimateDiff. Unfortunately, we have no plan to open source ELLA-SDXL. However, we are very happy to keep in touch with you, and we can share the experience we have learned in training ELLA-SDXL

rundiffusion commented 6 months ago

@budui We're so glad you like it! Remember that Juggernaut XL does not have a license that permits it to be distributed. We would very much like to be involved in that process if you are already integrating ELLA with Juggernaut XL in your research. The team would LOVE to see some results from Juggernaut+Ella! That would be so exciting!

We understand you're not going to open source the SDXL alignment. That probably includes releasing a model that is open sourced. We think that is fine. We too are finding ways to build IP and keep it to build a sustainable business. (like our SFW Juggernaut XL model that is proprietary)

I'm sure we can make something lucrative for your stakeholders with some sort of licensing partnership. Juggernaut has been downloaded over half a million times world wide and would be an excellent brand to attach to Ella to license out to inference providers or private companies.

We definitely should have a talk with our teams to find something that makes sense. Please email darin@rundiffusion.com so we can chat privately.