ahrm / UnstableFusion

A Stable Diffusion desktop frontend with inpainting, img2img and more!
GNU General Public License v3.0
1.25k stars 86 forks source link

Add support for the latest version of the diffusers library. #33

Closed ZeroCool940711 closed 1 year ago

ZeroCool940711 commented 1 year ago

It seems like the latest version of diffusers had some huge improvements on performance, I tried modifying the code to work with the latest diffusers and had partial success with doing so, I was able to get the generation working but not inpainting, I tested version 0.5.0 of diffusers and I was getting like 1.6it/s, then version 0.6.0 was giving me around 3-5it/s. What I did was just change the lines with ["sample"][0] to [0][0] like in

im = self.text2img(
                prompt=prompt,
                width=512,
                height=512,
                strength=strength,
                num_inference_steps=steps,
                guidance_scale=guidance_scale,
                callback=callback,
                negative_prompt=negative_prompt,
                generator=self.get_generator(seed)
            )["sample"][0]

I just replaced it to be

im = self.text2img(
                prompt=prompt,
                width=512,
                height=512,
                strength=strength,
                num_inference_steps=steps,
                guidance_scale=guidance_scale,
                callback=callback,
                negative_prompt=negative_prompt,
                generator=self.get_generator(seed)
            )[0][0]

This should return in theory the correct image but for some reason inpainting doesn't work, generation and reimaging do work tho, my guess is that the mask used for inpainting doesnt match the input image or the unet config, this is the error I get when I try to run inpainting with the modifications I mentioned before

ValueError: Incorrect configuration settings! The config of `pipeline.unet`: FrozenDict([('sample_size', 64), ('in_channels', 4), 
('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0),
 ('down_block_types', ['CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'DownBlock2D']), ('up_block_types', ['UpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D']), ('block_out_channels', [320, 640, 1280, 1280]), 
('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 
1e-05), ('cross_attention_dim', 768), ('attention_head_dim', 8), ('_class_name', 'UNet2DConditionModel'), ('_diffusers_version', '0.6.0'), 
('_name_or_path', 
'C:\\Users\\ZeroCool\\.cache\\huggingface\\diffusers\\models--CompVis--stable-diffusion-v1-4\\snapshots\\a304b1ab1b59dd6c3ba9c40705c29c6de4144096\\unet')]) expects 4 but received `num_channels_latents`: 4 + `num_channels_mask`: 1 + `num_channels_masked_image`: 4 = 9.
 Please verify the config of `pipeline.unet` or your `mask_image` or `image` input.

Hope this helps somehow to reduce the amount of stuff needed for adding support for the latest diffusers. Thanks for the time and have a good day.

ahrm commented 1 year ago

Added in 9a1906aeb46e354f442d0756dd694bdac33cd799 (we use legacy inpainting for now).