LambdaLabsML / lambda-diffusers

MIT License
563 stars 90 forks source link

WIP: onnx support #1

Open chuanli11 opened 2 years ago

chuanli11 commented 2 years ago

Add onnx support to sd_image model.

python scripts/convert_sd_image_checkpoint_to_onnx.py \
--model_path <path-to-pytorch-ckpt-or-huggingfaces-url> \
--output_path <path-to-output-onnx-model>
from pathlib import Path
from lambda_diffusers import StableDiffusionImageEmbedOnnxPipeline
from PIL import Image

pipe = StableDiffusionImageEmbedOnnxPipeline.from_pretrained(
    "path-to-output-onnx-model",
    revision="onnx",
    provider="CUDAExecutionProvider", # or CPUExecutionProvider for running the job on CPU
)

im = Image.open("path-to-your-input-image")
num_samples = 2
image = pipe(num_samples*[im])
image = image["sample"]
base_path = Path("./")
base_path.mkdir(exist_ok=True, parents=True)
for idx, im in enumerate(image):
    im.save(base_path/f"{idx:06}_onnx.png")
chuanli11 commented 2 years ago

It is not ready to be merged because the onnx model does not speed things up for GPU. This is not specifically for sd_image but also for the reference huggingface model. See issue here.

The onnx model does give ~35% iteration time when running on CPU. However it is still too slow to be used (GPU can be two orders of magnitude faster)

Model Format CPU CUDA RTX 8000 CUDA RTX 8000 + autocast
PyTorch 10.16s/it 4.57it/s 8.92 it/s
Onnx 6.70s/it 2.23it/s N/A
chuanli11 commented 2 years ago

Hugginface diffusers has some issue with onnxruntime-gpu installation. See discussion here and here.

chuanli11 commented 2 years ago

The current onnx pipeline only use onnx for vae and unet, and keep the other parts of the model as PyTorch ckpts

This is due to

However, having these modules running in PyTorch (instead of onnx) doesn't seem to have much of a impact on the speed, since the most expensive compute is the diffusion step (unet) and those numbers of the sd_image model (this table) matches the numbers of the reference huggingface model (table below), despite all modules in the huggingface model can be converted into onnx.

Model Format CPU CUDA RTX 8000 CUDA RTX 8000 + autocast
PyTorch 10.16s/it 4.56it/s 8.78 it/s
Onnx 6.64s/it 2.21it/s N/A