Closed jdp8 closed 1 year ago
Wow, thanks. I'm a bit slow with the library update but i think it was worth it, i got one unet step to be less than 1 second on 3090, you can see results here https://github.com/microsoft/onnxruntime/issues/17373#issuecomment-1713830724 Also works on m1 mac and should be around 2sec for a step
I need to do a few fixes on ONNX and then will publish the update. Then this one will be actually usable and will change into diffusers.js
Will review the changes later today
No worries. That UNET speedup is very impressive, thank you for that! Perfect, I'll wait for that update. In addition, I will close the issue that I had since I was able to run the project successfully.
Let me know if there are any issues with my changes. Thank you!
@jdp8 i've updated @aislamov/onnxruntime-web64 to version 1.0.0 and now it should work very fast you'll need chrome canary 119.0.6006.0 (i think it released today)
change this line
const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), { executionProviders: ['wasm'] })
to
const unet = await InferenceSession.create(await getModelFile(modelRepoOrPath, '/unet/model.onnx', true, opts), sessionOption)
if you are using windows and don't have windows sdk installed, you might need to download this https://github.com/microsoft/DirectXShaderCompiler and put dxcompiler.dll to google chrome dir. Then launch chome with this command line flag --enable-dawn-features=allow_unsafe_apis,use_dxc
On mac or linux you don't need to download anything and flag would be --enable-dawn-features=allow_unsafe_apis
if it will crash with device lost
error then it means there's not enough VRAM, try using text encoder and/or VAE on cpu
let me know if something does not work
@dakenf I tried to run it with your changes but I got this error when trying to download the UNET model:
Errors in the console:
These are the steps that I followed, please let me know if I'm missing something:
I'm using an M1 Pro (16 GB) and the models are still being downloaded from the public/models/aislamov/stable-diffusion-2-1-base-onnx
directory with the 0799d8182f3385a6acf4ea06eea98c39c007f5c7
commit. This was also attempted on an M2 MAX (32 GB) and it failed with the same error. In addition, I changed the UNET executionProvider back to 'wasm' and the download of the VAE Decoder failed.
Hmm. Let me do some more tests and update the model in reposotory
@jdp8 can you try changing the first line in StableDiffusionPipeline to
import { InferenceSession } from '@aislamov/onnxruntime-web64/webgpu';
I changed the first line like you mentioned and the models were loaded successfully but when I ran the model, it showed a black image on the canvas. Numerous errors were shown in the console. I'll attach a screenshot and a log file of the errors.
Ok, i've just published @aislamov/onnxruntime-web64 1.0.1 It should resolve the issue, let me know if that won't work
Did a few tests with Text-To-Image and it works like a charm! I ran it for 20 steps with various prompts using the VAE only on the last step and the entire process took ~2 minutes on an M1 Pro. That's a remarkable speedup compared to before. Excellent work, thank you so much for this!
On another note I tried to run the Image-To-Image pipeline but the process failed when trying to encode the input image, specifically running the VAE Encoder model with 'webgpu' as the executionProvider. Maybe this could be due to an unsupported operation or something similar? I'll attach what appears when the code reaches the point of encoding the input image.
Most likely it's out of memory issue. Try these
i'm now working on VRAM usage reduction since it uses about 10gb in current release
I'm running the VAE Encoder that's in your Hugging Face repo, specifically this one from the 'Initial fp16 ONNX commit'. The strange thing is that it doesn't fail when using the 'wasm' backend but fails when using the 'webgpu' backend.
I tried to use two methods separately to release or dispose the text_encoder and vae_encoder sessions after using them in order to free up memory but they didn't prevent the error from occurring. The methods that I used were await this.vae_encoder.release()
and await this.vae_encoder.handler.dispose()
. I didn't find any official documentation for these methods. The only references that I found were this issue and this issue.
I'm going to merge it and then do some testing
I've updated the model on huggingface and onnxruntime package. It now takes 1 second for a step on m1 max and image2image works fine
I will do some refactoring on the weekend to change this repo to diffusers.js library Next steps would be adding controlnet, more efficient scheduler and SDXL support
Just tested it and it generated the image in less than a minute on M1 Pro. Thank you!
Awesome, that's great news! I'll be on the lookout to see if I can help with anything else.
Description
The changes include adding the Image-To-Image pipeline, along with the input fields for the Text-To-Image and Image-To-Image pipelines such as the Seed, Guidance Scale, Input Image and Strength. The Image-To-Image pipeline is basically the same as the Text-To-Image pipeline except that instead of having a random latent as input, a noisy input image is used as the input latent.
Specific Changes
App.tsx
file.App.tsx
.App.tsx
.encodeImage(input_image)
that uses the VAE Encoder to encode the input image from image space to latent space. The function was added toStableDiffusionPipeline.ts
. The code was taken from thepil_to_latent(input_im)
function shown here.add_noise()
. Added to thePNDMScheduler.ts
file. The code was taken from here.package.json
file in order to use a seedable Random Number Generator (RNG).randomNormal()
andrandomNormalTensor()
functions in theTensor.ts
file to accept the input seed and the RNG.encodeImage()
andadd_noise()
functions and add a modification to the img2img timesteps depending on the strength and inference steps. Added to theStableDiffusionPipeline.ts
file. Code taken from here.Issues
To make the project work successfully, there were other slight changes done:
package.json
file to "."images
argument from therunInference()
function in theApp.tsx
file.