WIP: onnx support - Githubissues

chuanli11 commented 2 years ago

Add onnx support to sd_image model.

Convert sd_image model to onnx

python scripts/convert_sd_image_checkpoint_to_onnx.py \
--model_path <path-to-pytorch-ckpt-or-huggingfaces-url> \
--output_path <path-to-output-onnx-model>

Test onnx model

from pathlib import Path
from lambda_diffusers import StableDiffusionImageEmbedOnnxPipeline
from PIL import Image

pipe = StableDiffusionImageEmbedOnnxPipeline.from_pretrained(
    "path-to-output-onnx-model",
    revision="onnx",
    provider="CUDAExecutionProvider", # or CPUExecutionProvider for running the job on CPU
)

im = Image.open("path-to-your-input-image")
num_samples = 2
image = pipe(num_samples*[im])
image = image["sample"]
base_path = Path("./")
base_path.mkdir(exist_ok=True, parents=True)
for idx, im in enumerate(image):
    im.save(base_path/f"{idx:06}_onnx.png")

chuanli11 commented 2 years ago

It is not ready to be merged because the onnx model does not speed things up for GPU. This is not specifically for sd_image but also for the reference huggingface model. See issue here.

The onnx model does give ~35% iteration time when running on CPU. However it is still too slow to be used (GPU can be two orders of magnitude faster)

Model Format	CPU	CUDA RTX 8000	CUDA RTX 8000 + autocast
PyTorch	10.16s/it	4.57it/s	8.92 it/s
Onnx	6.70s/it	2.23it/s	N/A

chuanli11 commented 2 years ago

Hugginface diffusers has some issue with onnxruntime-gpu installation. See discussion here and here.

chuanli11 commented 2 years ago

The current onnx pipeline only use onnx for vae and unet, and keep the other parts of the model as PyTorch ckpts

This is due to

get_image_features is not a nn.Module so doesn't work with onnx_export. Similar discussion found here. Tried replacing it with CLIPVisionModel, was able to export but caused another error that is related to "TypeError: forward() takes 1 positional argument but 2 were given" during inference.
safety_checker onnx model doesn't work with batch size > 1. Haven't looked into the reason.

However, having these modules running in PyTorch (instead of onnx) doesn't seem to have much of a impact on the speed, since the most expensive compute is the diffusion step (unet) and those numbers of the sd_image model (this table) matches the numbers of the reference huggingface model (table below), despite all modules in the huggingface model can be converted into onnx.

Model Format	CPU	CUDA RTX 8000	CUDA RTX 8000 + autocast
PyTorch	10.16s/it	4.56it/s	8.78 it/s
Onnx	6.64s/it	2.21it/s	N/A

LambdaLabsML / lambda-diffusers

WIP: onnx support #1