Add support for the latest version of the diffusers library.

It seems like the latest version of diffusers had some huge improvements on performance, I tried modifying the code to work with the latest diffusers and had partial success with doing so, I was able to get the generation working but not inpainting, I tested version 0.5.0 of diffusers and I was getting like 1.6it/s, then version 0.6.0 was giving me around 3-5it/s. What I did was just change the lines with ["sample"][0] to [0][0] like in

im = self.text2img(
                prompt=prompt,
                width=512,
                height=512,
                strength=strength,
                num_inference_steps=steps,
                guidance_scale=guidance_scale,
                callback=callback,
                negative_prompt=negative_prompt,
                generator=self.get_generator(seed)
            )["sample"][0]

I just replaced it to be

im = self.text2img(
                prompt=prompt,
                width=512,
                height=512,
                strength=strength,
                num_inference_steps=steps,
                guidance_scale=guidance_scale,
                callback=callback,
                negative_prompt=negative_prompt,
                generator=self.get_generator(seed)
            )[0][0]

This should return in theory the correct image but for some reason inpainting doesn't work, generation and reimaging do work tho, my guess is that the mask used for inpainting doesnt match the input image or the unet config, this is the error I get when I try to run inpainting with the modifications I mentioned before

ValueError: Incorrect configuration settings! The config of `pipeline.unet`: FrozenDict([('sample_size', 64), ('in_channels', 4), 
('out_channels', 4), ('center_input_sample', False), ('flip_sin_to_cos', True), ('freq_shift', 0),
 ('down_block_types', ['CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'CrossAttnDownBlock2D', 'DownBlock2D']), ('up_block_types', ['UpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D', 'CrossAttnUpBlock2D']), ('block_out_channels', [320, 640, 1280, 1280]), 
('layers_per_block', 2), ('downsample_padding', 1), ('mid_block_scale_factor', 1), ('act_fn', 'silu'), ('norm_num_groups', 32), ('norm_eps', 
1e-05), ('cross_attention_dim', 768), ('attention_head_dim', 8), ('_class_name', 'UNet2DConditionModel'), ('_diffusers_version', '0.6.0'), 
('_name_or_path', 
'C:\\Users\\ZeroCool\\.cache\\huggingface\\diffusers\\models--CompVis--stable-diffusion-v1-4\\snapshots\\a304b1ab1b59dd6c3ba9c40705c29c6de4144096\\unet')]) expects 4 but received `num_channels_latents`: 4 + `num_channels_mask`: 1 + `num_channels_masked_image`: 4 = 9.
 Please verify the config of `pipeline.unet` or your `mask_image` or `image` input.

Hope this helps somehow to reduce the amount of stuff needed for adding support for the latest diffusers. Thanks for the time and have a good day.

ahrm / UnstableFusion

Add support for the latest version of the diffusers library. #33