ACM MM'23 (oral), SUR-adapter for pre-trained diffusion models can acquire the powerful semantic understanding and reasoning capabilities from large language models to build a high-quality textual semantic representation for text-to-image generation.
Lexica 1, Civitai 2, and Stable Diffusion Online have a large number of images. May I ask what criteria or keywords did you use to select collect 114,148 image-text pairs from these three websites?
Lexica 1, Civitai 2, and Stable Diffusion Online have a large number of images. May I ask what criteria or keywords did you use to select collect 114,148 image-text pairs from these three websites?