YUHANG-Ma / LLM4GEN

31 stars 2 forks source link

For the features of concat LLMs and CLIP features #3

Open peki12345 opened 1 month ago

peki12345 commented 1 month ago

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

YUHANG-Ma commented 1 month ago

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

  1. Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.
  2. LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.
peki12345 commented 1 month ago

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

  1. Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.
  2. LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.

Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors

  1. Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.
  2. LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.

Thank you for your reply. I also believe that utilizing CLIP information can enable the model to achieve better results with less data. But I think the Cross Adapter Module has already fused LLMs and CLIP information with attention, so it feels a bit redundant to concatenate CLIP again after that?