Open peki12345 opened 4 months ago
Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors
Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors
- Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.
- LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.
Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors
- Using LLM4GEN, you could use less data pairs to achieve the same or even better performance than other methods. Of course, you could use LLM instead of CLIP, but you need more data pairs and computing sources to train your model.
- LLM4GEN can seamlessly integrate with existing tools in LDM, such as ControlNet. Considering these two aspects, we concat the LLMs with Clip.
Thank you for your reply. I also believe that utilizing CLIP information can enable the model to achieve better results with less data. But I think the Cross Adapter Module has already fused LLMs and CLIP information with attention, so it feels a bit redundant to concatenate CLIP again after that?
Nice work! I have a question, why should we concat these two features(LLMs and CLIP) instead of just using LLMs' features, as some other works have done: https://github.com/Kwai-Kolors/Kolors