exx8 / differential-diffusion

336 stars 17 forks source link

For SDXL turbo? #7

Open peki12345 opened 4 months ago

peki12345 commented 4 months ago

Amazing work! I've tried using several SDXL text2img models for img2img or inpaint tasks and have had good results. However, when I attempted to use the SDXL turbo model for these tasks, it failed. The generated content had nothing to do with my original image and it seemed like the model was still trying to perform the text2img task. I'm curious about what's going on. Do you have any suggestions?

peki12345 commented 4 months ago

I think I found the reason, SDXL turbo is a 1 step model, it only execute https://github.com/exx8/differential-diffusion/blob/main/SDXL/diff_pipe.py#L982, the map has no control over it. And I noticed that the fewer the steps, the weaker the control of the map. But I don't know how to fix it...

At the same time, I find another problem. For the outpaint task, map control is also very weak, and the surrounding generated images have essentially no relationship to the input internal original image.

exx8 commented 4 months ago

First of all thank you for the feedback! When the paper began, in the ancient 7th of April 2023, there were no SDXL and no Turbo, so some of the new models are not covered in this paper. So the number of possible different regions == number of steps in the inference process. For Turbo with 5 steps, you get approximation of 5 shades in the map. We can do a little trick and enlarge the number of regions by 1 (just copy from the original after the model returns its answer). But there is a more fundamental question that needs to be answered: to what extent do distilled models support the strength concept? As seen in SDXL-Lightning, SDXL Turbo, the student only being taught a subset of intermediate steps on training. So even img2img with specific values (such as 0.63) might not be possible with the classical algorithm. So the concept of strength might be not as robust as it is on the original models. Though I believe, an approximation of it might be learned implicitly by the student. I have a few ideas that might work, but this "hyper diff diff"s are not covered by this paper.

Can you detail the outpaint task you mentioned? Do you take an image, a map that is larger than the original image and want to both expand and edit the original image?

peki12345 commented 4 months ago

First of all thank you for the feedback! When the paper began, in the ancient 7th of April 2023, there were no SDXL and no Turbo, so some of the new models are not covered in this paper. So the number of possible different regions == number of steps in the inference process. For Turbo with 5 steps, you get approximation of 5 shades in the map. We can do a little trick and enlarge the number of regions by 1 (just copy from the original after the model returns its answer). But there is a more fundamental question that needs to be answered: to what extent do distilled models support the strength concept? As seen in SDXL-Lightning, SDXL Turbo, the student only being taught a subset of intermediate steps on training. So even img2img with specific values (such as 0.63) might not be possible with the classical algorithm. So the concept of strength might be not as robust as it is on the original models. Though I believe, an approximation of it might be learned implicitly by the student. I have a few ideas that might work, but this "hyper diff diff"s are not covered by this paper.

Can you detail the outpaint task you mentioned? Do you take an image, a map that is larger than the original image and want to both expand and edit the original image?

Thank you for your reply!Outpaint task is equivalent to a special inpaint task. Given an image and a mask, I can get good results using the SDXL_inpaint model. However, when I use differential-diffusion, I get poor results... image: img mask: mask XL_inpaint result: xl_inpaint differential-diffusion result: dd

exx8 commented 4 months ago

Good question, So in the research we haven't explored outpainting. I personally see it as near complete problem. I tend to believe that with a little elaboration (as done in soft inpainting) we can make the framework solve it too, probably by iteratively adding larger and larger strokes where its content is just the reflection of the previous steps, then applying medium strength unguided editing, and then overriding all the addition with medium strength to create something cohesive. But I haven't tested it. It is just an intelligent guess.

joelandresnavarro commented 4 months ago

@peki12345 what model are you using for sdxl inpainting? I've been looking for a model all this week and I still can't find it.

@exx8 I don't know much about programming and the technical part of the different ai models or tools but your project is very interesting, good job!