Open arisha07 opened 1 year ago
This would also help add local models from dreambooth, and 1.5 SD.
This looked like a hint, but I couldn't get all the requirements for TensorFlow_OpenVINO\get_frozen_graph.py https://opencv.org/running-tensorflow-model-inference-in-openvino-2/ https://opencv.org/how-to-speed-up-deep-learning-inference-using-openvino-toolkit-2/
@arisha07 @iwoolf https://huggingface.co/ShadowPower/waifu-diffusion.openvino ask this guy
Edit: here's the reply
https://huggingface.co/ShadowPower/waifu-diffusion.openvino/discussions/1#6370f26f3d1bd47a4ebf19a4
Hello there! , can you please share the steps and command line you used to convert the original model to OpenVINO IRs. Need this to help optimize these models further. ShadowPower 3 days ago
Since I did this a long time ago, it was necessary to use an older version of the diffusers library.
I merged the code I used into one file and put it here: https://gist.github.com/ShadowPower/1632b77626f863c860130ec4cddf20d5
The diffusers library at that time was not compatible with exporting onnx and required some modifications, a modified version of which is available here: https://github.com/harishanand95/diffusers
In fact, the newer versions of diffusers export onnx from this fork. You can also try to modify the export script to be compatible with newer versions of the diffusers library.
There is a tutorial on how to convert the model to the ONNX format and then to the IRs: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb It works fine except it lacks the vae encoder, but it's pretty easy to add. And the format is a little bit different from what is used here and needs some tweaking. I also used it to convert models other than SD1.4, for example, SD1.5, SD2.1, openjourney. No major problems so far. Although some models couldn't be converted because of half precision.
@RedAndr ty
Actually, I was wrong about the half-precision. These models could be converted too. Just need to add torch_dtype=torch.float32 in the pipe options.
There is a tutorial on how to convert the model to the ONNX format and then to the IRs: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb It works fine except it lacks the vae encoder, but it's pretty easy to add. And the format is a little bit different from what is used here and needs some tweaking. I also used it to convert models other than SD1.4, for example, SD1.5, SD2.1, openjourney. No major problems so far. Although some models couldn't be converted because of half precision.
@RedAndr Would you be able to let us know the tweaks you had to make so that it works with this implementation?
Frankly, I modified my version too much to find what I did at the beginning. However, it is quite simple, just run the code and you will see where the problem is. Or let me know what your error message is.
Okay was able to make the required changes in stable_diffusion_engine.py to get the IRs generated from the "225-stable-diffusion-text-to-image" notebook working with this demo. Thanks @RedAndr for the guidance.
@arisha07 do you want to share the changes you made to get it working?
When you use the IRs generated in the notebook - "225-stable-diffusion-text-to-image" with this demo.py you will get errors related to Keyerror. For example - KeyError: 'encoder_hidden_states'. Go to the stable_diffusion_engine.py and see where it is getting called from. Now when you look into unet.xml you will see that 'encoder_hidden_states' has now become 'encoder_hidden_state'. So make the changes for the keys accordingly in the code. Other such key changes are - "latent_model_input" -> "sample" "t" -> "timestep" "token" -> "input_ids"
@RedAndr it will be great if you could share the VAE encoder IR conversion part.
Sure:
@torch.no_grad()
def convert_vae_encoder_onnx(pipe:StableDiffusionPipeline, onnx_path:Path):
"""
Convert VAE model to ONNX, then IR format.
Function accepts pipeline, creates wrapper class for export only necessary for inference part,
prepares example inputs for ONNX conversion via torch.export,
Parameters:
pipe (StableDiffusionPipeline): Stable Diffusion pipeline
onnx_path (Path): File for storing onnx model
Returns:
None
"""
class VAEEncoderWrapper(torch.nn.Module):
def __init__(self, vae):
super().__init__()
self.vae = vae
def forward(self, sample):
latent = self.vae.encode(sample)[0].sample()
return latent
if not onnx_path.exists():
vae_encoder = VAEEncoderWrapper(pipe.vae)
text = 'a photo of an astronaut riding a horse on mars'
text_encoder = pipe.text_encoder
input_ids = pipe.tokenizer(
text,
padding="max_length",
max_length=pipe.tokenizer.model_max_length,
truncation=True,
return_tensors="pt",
).input_ids
with torch.no_grad():
text_encoder_output = text_encoder(input_ids)
image_shape = (1, 3, res_v, res_h)
image = torch.randn(image_shape)
t = torch.from_numpy(np.array(1, dtype=float))
max_length = input_ids.shape[-1]
uncond_input = pipe.tokenizer([""], padding="max_length", max_length=max_length, return_tensors="pt")
uncond_embeddings = pipe.text_encoder(uncond_input.input_ids)[0]
encoder_hidden_state = torch.cat([uncond_embeddings, text_encoder_output[0]])
vae_encoder.eval()
with torch.no_grad():
torch.onnx.export(
vae_encoder, (image,), onnx_path, input_names=['init_image'], output_names=['sample'],
#dynamic_axes={"init_image": {0: "batch", 1: "channels", 2: "height", 3: "width"}},
opset_version = opset # onnx opset version for export
)
print('VAE encoder successfully converted to ONNX')
VAEE_ONNX_PATH = Path('vae_encoder.onnx')
VAEE_OV_PATH = VAEE_ONNX_PATH.with_suffix('.xml')
if not VAEE_OV_PATH.exists():
convert_vae_encoder_onnx(pipe, VAEE_ONNX_PATH)
print(f"mo --input_model {VAEE_ONNX_PATH} --compress_to_fp16")
print('VAE successfully converted to IR')
Uncomment the dynamic_axes line if you need a variable resolution. opset = 16 in my case, res_v and res_h are self-explanatory.
We updated the notebooks and so this demo and the notebooks will work together. i.e., the converted IR will work directly.
I also updated the new FP16 as the default and so the download is smaller and also works much much faster on GPUs. https://huggingface.co/bes-dev/stable-diffusion-v1-4-openvino
Hello, can you please share the steps and command line you used to convert the original model to OpenVINO IRs. Need this to help optimize these models further.