Closed joelpaulkoch closed 2 months ago
Hey @joelpaulkoch, thanks for the PR! I will have a more detailed look later, for now a couple high-level comments :)
there is a conditioning scale parameter in the dif fusers implementation, at the moment I have a constant of 1 for this.
Having is a optional serving input sounds good (similar to how we have :seed, for example).
I added a new model architecture :with_additional_residuals and separate input and core functions.
Since the difference is only in inputs, I would totally just have them as optional inputs in the :base
architecture, yeah. This also more closely matches what diffusers do.
Similarly, I've copied the existing stable diffusion implementation and adapted it to support the control net. It might be better to have it in the existing StableDiffusion module.
I think a separate module makes sense, in general I would have one module per diffusion type (SD, SD control net, SD XL, ...) and then a serving function for a task (currently only text_to_image
, but could be image_to_image
, and so on). This would roughly correspond to diffusers, such that a serving function corresponds to a pipeline class and a module to the pipeline grouping directory they have.
Preprocessing is converting the tensor to f32.
I see in diffusers they have VaeImageProcessor
, though if in this case it always comes down to converting into f32, then it's probably fine to just be a function.
We could share more logic between the servings, but it's fine for now, we can refactor once there are more :)
Btw. I updated the tests to use tiny checkpoints and generated reference values using hf/diffusers :)
I want to share my work on using ControlNet with Stable Diffusion. These are the three parts and notes on current limitations:
ControlNet model
diffusers
implementation, at the moment I have a constant of 1 for this. Would you add it as (optional) input?diffusers
andbumblebee
), so I kept the test with "lllyasviel/sd-controlnet-scribble".UNet
:with_additional_residuals
and separate input and core functions. But in the end the only difference is that additional residuals are passed in and added. So alternatively this could go in the:base
architecture as optional inputs and add layers I guess.Stable Diffusion with ControlNet
I've tried all the control nets listed here with the corresponding example and got sensible results for all but the normal map one. I'm not sure what's the issue with the normal map, but could imagine it's because of the preprocessing, or I simply did not run enough steps.