dakenf / diffusers.js

diffusers implementation for node.js and browser
https://islamov.ai/diffusers.js/
316 stars 36 forks source link

Add Image-To-Image Pipeline #3

Closed jdp8 closed 1 year ago

jdp8 commented 1 year ago

Description

The changes include adding the Image-To-Image pipeline, along with the input fields for the Text-To-Image and Image-To-Image pipelines such as the Seed, Guidance Scale, Input Image and Strength. The Image-To-Image pipeline is basically the same as the Text-To-Image pipeline except that instead of having a random latent as input, a noisy input image is used as the input latent.

Specific Changes

Issues

To make the project work successfully, there were other slight changes done:

dakenf commented 1 year ago

Wow, thanks. I'm a bit slow with the library update but i think it was worth it, i got one unet step to be less than 1 second on 3090, you can see results here https://github.com/microsoft/onnxruntime/issues/17373#issuecomment-1713830724 Also works on m1 mac and should be around 2sec for a step

I need to do a few fixes on ONNX and then will publish the update. Then this one will be actually usable and will change into diffusers.js

Will review the changes later today

jdp8 commented 1 year ago

No worries. That UNET speedup is very impressive, thank you for that! Perfect, I'll wait for that update. In addition, I will close the issue that I had since I was able to run the project successfully.

Let me know if there are any issues with my changes. Thank you!

dakenf commented 1 year ago

@jdp8 i've updated @aislamov/onnxruntime-web64 to version 1.0.0 and now it should work very fast you'll need chrome canary 119.0.6006.0 (i think it released today)

change this line

const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), { executionProviders: ['wasm'] })

to

const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), sessionOption)

if you are using windows and don't have windows sdk installed, you might need to download this https://github.com/microsoft/DirectXShaderCompiler and put dxcompiler.dll to google chrome dir. Then launch chome with this command line flag --enable-dawn-features=allow_unsafe_apis,use_dxc On mac or linux you don't need to download anything and flag would be --enable-dawn-features=allow_unsafe_apis

if it will crash with device lost error then it means there's not enough VRAM, try using text encoder and/or VAE on cpu

let me know if something does not work

jdp8 commented 1 year ago

@dakenf I tried to run it with your changes but I got this error when trying to download the UNET model:

Screenshot 2023-09-14 at 10 44 51 AM

Errors in the console:

Screenshot 2023-09-14 at 10 57 25 AM

These are the steps that I followed, please let me know if I'm missing something:

  1. Updated the @aislamov/onnxruntime-web64 to the 1.0.0 version.
  2. Changed the UNET executionProvider to 'webgpu'.
  3. Updated Chrome Canary (currently I have Version 119.0.6008.0) and launched it with the mentioned flag.

I'm using an M1 Pro (16 GB) and the models are still being downloaded from the public/models/aislamov/stable-diffusion-2-1-base-onnx directory with the 0799d8182f3385a6acf4ea06eea98c39c007f5c7 commit. This was also attempted on an M2 MAX (32 GB) and it failed with the same error. In addition, I changed the UNET executionProvider back to 'wasm' and the download of the VAE Decoder failed.

dakenf commented 1 year ago

Hmm. Let me do some more tests and update the model in reposotory

dakenf commented 1 year ago

@jdp8 can you try changing the first line in StableDiffusionPipeline to

import { InferenceSession } from '@aislamov/onnxruntime-web64/webgpu';
jdp8 commented 1 year ago

I changed the first line like you mentioned and the models were loaded successfully but when I ran the model, it showed a black image on the canvas. Numerous errors were shown in the console. I'll attach a screenshot and a log file of the errors.

Screenshot 2023-09-19 at 1 06 48 PM

WebGPU Console Errors.log

dakenf commented 1 year ago

Ok, i've just published @aislamov/onnxruntime-web64 1.0.1 It should resolve the issue, let me know if that won't work

jdp8 commented 1 year ago

Did a few tests with Text-To-Image and it works like a charm! I ran it for 20 steps with various prompts using the VAE only on the last step and the entire process took ~2 minutes on an M1 Pro. That's a remarkable speedup compared to before. Excellent work, thank you so much for this!

On another note I tried to run the Image-To-Image pipeline but the process failed when trying to encode the input image, specifically running the VAE Encoder model with 'webgpu' as the executionProvider. Maybe this could be due to an unsupported operation or something similar? I'll attach what appears when the code reaches the point of encoding the input image.

Screenshot 2023-09-19 at 4 54 42 PM
dakenf commented 1 year ago

Most likely it's out of memory issue. Try these

  1. make sure you run fp16 version of vae encoder (i think i havent included it on huggingface)
  2. try loading it separately to get latents and then call .release() (don't remember exact method) and then pass it to pipeline

i'm now working on VRAM usage reduction since it uses about 10gb in current release

jdp8 commented 1 year ago

I'm running the VAE Encoder that's in your Hugging Face repo, specifically this one from the 'Initial fp16 ONNX commit'. The strange thing is that it doesn't fail when using the 'wasm' backend but fails when using the 'webgpu' backend.

I tried to use two methods separately to release or dispose the text_encoder and vae_encoder sessions after using them in order to free up memory but they didn't prevent the error from occurring. The methods that I used were await this.vae_encoder.release() and await this.vae_encoder.handler.dispose(). I didn't find any official documentation for these methods. The only references that I found were this issue and this issue.

dakenf commented 1 year ago

I'm going to merge it and then do some testing

dakenf commented 1 year ago

I've updated the model on huggingface and onnxruntime package. It now takes 1 second for a step on m1 max and image2image works fine

I will do some refactoring on the weekend to change this repo to diffusers.js library Next steps would be adding controlnet, more efficient scheduler and SDXL support

jdp8 commented 1 year ago

Just tested it and it generated the image in less than a minute on M1 Pro. Thank you!

Awesome, that's great news! I'll be on the lookout to see if I can help with anything else.