jdp8 commented 1 year ago

Description

The changes include adding the Image-To-Image pipeline, along with the input fields for the Text-To-Image and Image-To-Image pipelines such as the Seed, Guidance Scale, Input Image and Strength. The Image-To-Image pipeline is basically the same as the Text-To-Image pipeline except that instead of having a random latent as input, a noisy input image is used as the input latent.

Specific Changes

[X] Add input elements to App.tsx file.
[X] Add a function to get the RGB data from an image to App.tsx.
[X] Add a function to upload an image, resize it to (512,512) and call the function to get the RGB data. Added to App.tsx.
[X] Add a function called encodeImage(input_image) that uses the VAE Encoder to encode the input image from image space to latent space. The function was added to StableDiffusionPipeline.ts. The code was taken from the pil_to_latent(input_im) function shown here.
[X] Add a function to add noise to the input image called add_noise(). Added to the PNDMScheduler.ts file. The code was taken from here.
[X] Add the seedrandom package to the package.json file in order to use a seedable Random Number Generator (RNG).
[X] Modify the randomNormal() and randomNormalTensor() functions in the Tensor.ts file to accept the input seed and the RNG.
[X] Call the encodeImage() and add_noise() functions and add a modification to the img2img timesteps depending on the strength and inference steps. Added to the StableDiffusionPipeline.ts file. Code taken from here.

Issues

To make the project work successfully, there were other slight changes done:

[X] Downloaded the models locally following the steps mentioned here.
[X] Change the homepage in the package.json file to "."
[X] Uncomment the images argument from the runInference() function in the App.tsx file.

dakenf commented 1 year ago

Wow, thanks. I'm a bit slow with the library update but i think it was worth it, i got one unet step to be less than 1 second on 3090, you can see results here https://github.com/microsoft/onnxruntime/issues/17373#issuecomment-1713830724 Also works on m1 mac and should be around 2sec for a step

I need to do a few fixes on ONNX and then will publish the update. Then this one will be actually usable and will change into diffusers.js

Will review the changes later today

jdp8 commented 1 year ago

No worries. That UNET speedup is very impressive, thank you for that! Perfect, I'll wait for that update. In addition, I will close the issue that I had since I was able to run the project successfully.

Let me know if there are any issues with my changes. Thank you!

dakenf commented 1 year ago

@jdp8 i've updated @aislamov/onnxruntime-web64 to version 1.0.0 and now it should work very fast you'll need chrome canary 119.0.6006.0 (i think it released today)

change this line

const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), { executionProviders: ['wasm'] })

to

const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), sessionOption)

if you are using windows and don't have windows sdk installed, you might need to download this https://github.com/microsoft/DirectXShaderCompiler and put dxcompiler.dll to google chrome dir. Then launch chome with this command line flag --enable-dawn-features=allow_unsafe_apis,use_dxc On mac or linux you don't need to download anything and flag would be --enable-dawn-features=allow_unsafe_apis

if it will crash with device lost error then it means there's not enough VRAM, try using text encoder and/or VAE on cpu

let me know if something does not work

jdp8 commented 1 year ago

@dakenf I tried to run it with your changes but I got this error when trying to download the UNET model:

Errors in the console:

These are the steps that I followed, please let me know if I'm missing something:

Updated the @aislamov/onnxruntime-web64 to the 1.0.0 version.
Changed the UNET executionProvider to 'webgpu'.
Updated Chrome Canary (currently I have Version 119.0.6008.0) and launched it with the mentioned flag.

I'm using an M1 Pro (16 GB) and the models are still being downloaded from the public/models/aislamov/stable-diffusion-2-1-base-onnx directory with the 0799d8182f3385a6acf4ea06eea98c39c007f5c7 commit. This was also attempted on an M2 MAX (32 GB) and it failed with the same error. In addition, I changed the UNET executionProvider back to 'wasm' and the download of the VAE Decoder failed.

dakenf commented 1 year ago

Hmm. Let me do some more tests and update the model in reposotory

dakenf commented 1 year ago

@jdp8 can you try changing the first line in StableDiffusionPipeline to

import { InferenceSession } from '@aislamov/onnxruntime-web64/webgpu';

jdp8 commented 1 year ago

I changed the first line like you mentioned and the models were loaded successfully but when I ran the model, it showed a black image on the canvas. Numerous errors were shown in the console. I'll attach a screenshot and a log file of the errors.

WebGPU Console Errors.log

dakenf commented 1 year ago

Ok, i've just published @aislamov/onnxruntime-web64 1.0.1 It should resolve the issue, let me know if that won't work

jdp8 commented 1 year ago

Did a few tests with Text-To-Image and it works like a charm! I ran it for 20 steps with various prompts using the VAE only on the last step and the entire process took ~2 minutes on an M1 Pro. That's a remarkable speedup compared to before. Excellent work, thank you so much for this!

On another note I tried to run the Image-To-Image pipeline but the process failed when trying to encode the input image, specifically running the VAE Encoder model with 'webgpu' as the executionProvider. Maybe this could be due to an unsupported operation or something similar? I'll attach what appears when the code reaches the point of encoding the input image.

dakenf commented 1 year ago

Most likely it's out of memory issue. Try these

make sure you run fp16 version of vae encoder (i think i havent included it on huggingface)
try loading it separately to get latents and then call .release() (don't remember exact method) and then pass it to pipeline

i'm now working on VRAM usage reduction since it uses about 10gb in current release

jdp8 commented 1 year ago

I'm running the VAE Encoder that's in your Hugging Face repo, specifically this one from the 'Initial fp16 ONNX commit'. The strange thing is that it doesn't fail when using the 'wasm' backend but fails when using the 'webgpu' backend.

I tried to use two methods separately to release or dispose the text_encoder and vae_encoder sessions after using them in order to free up memory but they didn't prevent the error from occurring. The methods that I used were await this.vae_encoder.release() and await this.vae_encoder.handler.dispose(). I didn't find any official documentation for these methods. The only references that I found were this issue and this issue.

dakenf commented 1 year ago

I'm going to merge it and then do some testing

dakenf commented 1 year ago

I've updated the model on huggingface and onnxruntime package. It now takes 1 second for a step on m1 max and image2image works fine

I will do some refactoring on the weekend to change this repo to diffusers.js library Next steps would be adding controlnet, more efficient scheduler and SDXL support

jdp8 commented 1 year ago

Just tested it and it generated the image in less than a minute on M1 Pro. Thank you!

Awesome, that's great news! I'll be on the lookout to see if I can help with anything else.

dakenf / diffusers.js

Add Image-To-Image Pipeline #3

Description

Specific Changes

Issues