For the features of concat LLMs and CLIP features

peki12345 commented 4 months ago

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

YUHANG-Ma commented 4 months ago

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.
LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.

peki12345 commented 4 months ago

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.

LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.

LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.

Thank you for your reply. I also believe that utilizing CLIP information can enable the model to achieve better results with less data. But I think the Cross Adapter Module has already fused LLMs and CLIP information with attention, so it feels a bit redundant to concatenate CLIP again after that?

YUHANG-Ma / LLM4GEN

For the features of concat LLMs and CLIP features #3