HVision-NKU / StoryDiffusion

Accepted as [NeurIPS 2024] Spotlight Presentation Paper
Apache License 2.0
5.98k stars 598 forks source link

Question about the detail of the text-to-image part #133

Open Exuan148 opened 5 months ago

Exuan148 commented 5 months ago

Hi, in the part of training free image generation pipeline, you inject features of several reference images into the self-attention, I would like to ask that where is the image features from? Are they from VAE encoder? Thanks!