bes-dev / stable_diffusion.openvino

Apache License 2.0
1.53k stars 207 forks source link

Need OpenVino's Model Optimizer command line to generate IRs from the original model. #95

Open arisha07 opened 1 year ago

arisha07 commented 1 year ago

Hello, can you please share the steps and command line you used to convert the original model to OpenVINO IRs. Need this to help optimize these models further.

iwoolf commented 1 year ago

This would also help add local models from dreambooth, and 1.5 SD.

iwoolf commented 1 year ago

This looked like a hint, but I couldn't get all the requirements for TensorFlow_OpenVINO\get_frozen_graph.py https://opencv.org/running-tensorflow-model-inference-in-openvino-2/ https://opencv.org/how-to-speed-up-deep-learning-inference-using-openvino-toolkit-2/

ClashSAN commented 1 year ago

@arisha07 @iwoolf https://huggingface.co/ShadowPower/waifu-diffusion.openvino ask this guy

Edit: here's the reply

https://huggingface.co/ShadowPower/waifu-diffusion.openvino/discussions/1#6370f26f3d1bd47a4ebf19a4

Hello there! , can you please share the steps and command line you used to convert the original model to OpenVINO IRs. Need this to help optimize these models further. ShadowPower 3 days ago

Since I did this a long time ago, it was necessary to use an older version of the diffusers library.

I merged the code I used into one file and put it here: https://gist.github.com/ShadowPower/1632b77626f863c860130ec4cddf20d5

The diffusers library at that time was not compatible with exporting onnx and required some modifications, a modified version of which is available here: https://github.com/harishanand95/diffusers

In fact, the newer versions of diffusers export onnx from this fork. You can also try to modify the export script to be compatible with newer versions of the diffusers library.

RedAndr commented 1 year ago

There is a tutorial on how to convert the model to the ONNX format and then to the IRs: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb It works fine except it lacks the vae encoder, but it's pretty easy to add. And the format is a little bit different from what is used here and needs some tweaking. I also used it to convert models other than SD1.4, for example, SD1.5, SD2.1, openjourney. No major problems so far. Although some models couldn't be converted because of half precision.

ClashSAN commented 1 year ago

@RedAndr ty

RedAndr commented 1 year ago

Actually, I was wrong about the half-precision. These models could be converted too. Just need to add torch_dtype=torch.float32 in the pipe options.

arisha07 commented 1 year ago

There is a tutorial on how to convert the model to the ONNX format and then to the IRs: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb It works fine except it lacks the vae encoder, but it's pretty easy to add. And the format is a little bit different from what is used here and needs some tweaking. I also used it to convert models other than SD1.4, for example, SD1.5, SD2.1, openjourney. No major problems so far. Although some models couldn't be converted because of half precision.

@RedAndr Would you be able to let us know the tweaks you had to make so that it works with this implementation?

RedAndr commented 1 year ago

Frankly, I modified my version too much to find what I did at the beginning. However, it is quite simple, just run the code and you will see where the problem is. Or let me know what your error message is.

arisha07 commented 1 year ago

Okay was able to make the required changes in stable_diffusion_engine.py to get the IRs generated from the "225-stable-diffusion-text-to-image" notebook working with this demo. Thanks @RedAndr for the guidance.

brmarkus commented 1 year ago

@arisha07 do you want to share the changes you made to get it working?

arisha07 commented 1 year ago

When you use the IRs generated in the notebook - "225-stable-diffusion-text-to-image" with this demo.py you will get errors related to Keyerror. For example - KeyError: 'encoder_hidden_states'. Go to the stable_diffusion_engine.py and see where it is getting called from. Now when you look into unet.xml you will see that 'encoder_hidden_states' has now become 'encoder_hidden_state'. So make the changes for the keys accordingly in the code. Other such key changes are - "latent_model_input" -> "sample" "t" -> "timestep" "token" -> "input_ids"

arisha07 commented 1 year ago

@RedAndr it will be great if you could share the VAE encoder IR conversion part.

RedAndr commented 1 year ago

Sure:

@torch.no_grad()
def convert_vae_encoder_onnx(pipe:StableDiffusionPipeline, onnx_path:Path):
    """
    Convert VAE model to ONNX, then IR format.
    Function accepts pipeline, creates wrapper class for export only necessary for inference part,
    prepares example inputs for ONNX conversion via torch.export,
    Parameters:
        pipe (StableDiffusionPipeline): Stable Diffusion pipeline
        onnx_path (Path): File for storing onnx model
    Returns:
        None
    """

    class VAEEncoderWrapper(torch.nn.Module):
        def __init__(self, vae):
            super().__init__()
            self.vae = vae

        def forward(self, sample):
            latent = self.vae.encode(sample)[0].sample()
            return latent

    if not onnx_path.exists():
        vae_encoder = VAEEncoderWrapper(pipe.vae)
        text = 'a photo of an astronaut riding a horse on mars'
        text_encoder = pipe.text_encoder
        input_ids = pipe.tokenizer(
            text,
            padding="max_length",
            max_length=pipe.tokenizer.model_max_length,
            truncation=True,
            return_tensors="pt",
        ).input_ids
        with torch.no_grad():
            text_encoder_output = text_encoder(input_ids)
        image_shape = (1, 3, res_v, res_h)
        image = torch.randn(image_shape)
        t = torch.from_numpy(np.array(1, dtype=float))
        max_length = input_ids.shape[-1]
        uncond_input = pipe.tokenizer([""], padding="max_length", max_length=max_length, return_tensors="pt")
        uncond_embeddings = pipe.text_encoder(uncond_input.input_ids)[0]
        encoder_hidden_state = torch.cat([uncond_embeddings, text_encoder_output[0]])

        vae_encoder.eval()
        with torch.no_grad():
            torch.onnx.export(
                vae_encoder, (image,), onnx_path, input_names=['init_image'], output_names=['sample'],
                #dynamic_axes={"init_image": {0: "batch", 1: "channels", 2: "height", 3: "width"}},
                opset_version = opset  # onnx opset version for export
            )
        print('VAE encoder successfully converted to ONNX')

VAEE_ONNX_PATH = Path('vae_encoder.onnx')
VAEE_OV_PATH = VAEE_ONNX_PATH.with_suffix('.xml')

if not VAEE_OV_PATH.exists():
    convert_vae_encoder_onnx(pipe, VAEE_ONNX_PATH)
    print(f"mo --input_model {VAEE_ONNX_PATH} --compress_to_fp16")
    print('VAE successfully converted to IR')

Uncomment the dynamic_axes line if you need a variable resolution. opset = 16 in my case, res_v and res_h are self-explanatory.

raymondlo84 commented 1 year ago

We updated the notebooks and so this demo and the notebooks will work together. i.e., the converted IR will work directly.

https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb

I also updated the new FP16 as the default and so the download is smaller and also works much much faster on GPUs. https://huggingface.co/bes-dev/stable-diffusion-v1-4-openvino