Canny Edge Detection StableDiffusionControlNet and StableDiffusionControlNetImg2Img Pipelines

Description

The changes include adding the StableDiffusionControlNet pipeline, specifically the Canny Edge Detection model along with the pre-processing function required to get the ControlNet input image which is done using OpenCV.js. The ControlNet pipeline is similar to the Image-To-Image pipeline since they both require an image as input. The main difference between them is that the Image-To-Image pipeline takes the input image with added noise and uses it as the input latent instead of random noise and the ControlNet pipeline takes the input image and other arguments as input to the ControlNet model which returns outputs (down block samples and middle block sample) that are used as input to the UNET model. The shape of the ControlNet inputs were taken from here and the shape of the UNET inputs were inferred based on the shapes and not the names.

In addition, the Image-To-Image feature was added to the StableDiffusionControlNet pipeline which resulted in the creation of the StableDiffusionControlNetImg2Img pipeline.

Convert Command

The command that I used to convert the model was: python conv_sd_to_onnx.py --model_path "runwayml/stable-diffusion-v1-5" --output_path "./model/sd1-5_fp16_cn_canny" --controlnet_path "lllyasviel/control_v11p_sd15_canny" --fp16 --attention-slicing auto from the converter that you use.

Specific Changes

[X] Add src/pipelines/StableDiffusionControlNetPipeline.ts file.
[X] Modify getRgbData() and uploadImage() functions in examples/react/src/App.tsx to account for ControlNet image upload.
[X] Add the @techstark/opencv-js and react-app-rewired packages to the examples/react/package.json file and replace some react-scripts with react-app-rewired in order to use the OpenCV.js functions.
[X] Add examples/react/config-overrides.js file which is necessary to use OpenCV.js. Taken from here.

Pre-Processing Libraries

In this section I'll include all of the pre-processing libraries that I found in case you want to use another one or add other ControlNet model:

Canny
OpenPose
- OpenCV.js
Semantic Segmentation
- OpenCV.js
- TensorFlow.js

Issues / Future Work

The order of the inputs to the UNET model was inferred based on the input shapes and not the name of the keys. I believe the order of the inputs is correct but I’m not 100% sure.
For every ControlNet model there is an individual pre-processing function that is needed to get the ControlNet input image. For now, OpenCV.js has the pre-processing functions of Canny, Pose Estimation and Semantic Segmentation. I selected this library because it had the most pre-processing functions that I could find but another library or libraries can be considered in order to add the remaining ControlNet models. Based on my understanding, the Annotators are models that can serve as a replacement for the pre-processing functions. The problem is that I have not found a converter script that can be used to convert these models to ONNX format. I’ll look more into this.
I was able to convert and use ControlNet 1.0 and 1.1 with SD 1.5. I tried to convert Stable Diffusion 2.1 with ControlNet for SD 2.1 but was not successful. I'll look more into this as well.
For now, I have the Canny ControlNet model hosted in my HuggingFace repo but I can transfer this model to you if you want to.

dakenf / diffusers.js