About the inference - Githubissues

bytedance / DEADiff

[CVPR 2024] Official implementation of "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations"

Apache License 2.0

190 stars 4 forks source link

About the inference #9

Open srymaker opened 2 months ago

srymaker commented 2 months ago

Thanks for your great work! I want to know, when I want to do style transfer task, do I need to input a reference picture, a style word corresponding to this reference picture and a target prompt to the model? Just like This triplet <reference img,reference style word,target prompt>

Tianhao-Qi commented 2 months ago

No, you only need a pair <reference image, target prompt>.

srymaker commented 2 months ago

Thank you for your answer. So what are the inputs and target during the training?

Tianhao-Qi commented 2 months ago

There are three kinds of training pairs:

reference and target images are with the same style, but distinct subjects (STRE);
reference and target images are with the same subject, but distinct styles (SERE);
reference and target images are the same (Reconstruction). You can refer to Sec 3.2 in our paper.

srymaker commented 2 months ago

Thank you,but in the paper,the qforme’s input should have the text {content} or {style},what is it

Tianhao-Qi commented 2 months ago

The text input of Q-former is the word "content" or "style".

srymaker commented 2 months ago

Oh,i see,thanks for your patience

SkylerZheng commented 2 months ago

Hi @Tianhao-Qi , does the current released code support this "Stylized Reference Object Generation" function? Basically I want to convert my given image to a different style by providing the text only, the given image is the source image rather than the style image.

Tianhao-Qi commented 1 month ago

You can refer to this script. Besides, if you want to keep the structure of the source image as well, you'll need to use the controlnet.