bes-dev / stable_diffusion.openvino

Apache License 2.0
1.53k stars 207 forks source link

Update model #104

Open Seegee opened 1 year ago

Seegee commented 1 year ago

Is it possible to drop in the model for stablediffusion 1.5, or 2.0?

ClashSAN commented 1 year ago

no, the model needs to be converted from onnx to openvino, and some code changes have to be made. I don't think it happen until we can document how to convert custom models from onnx to IR like Shadowpower did for wd-1.3

binarydepth commented 1 year ago

no, the model needs to be converted from onnx to openvino, and some code changes have to be made. I don't think it happen until we can document how to convert custom models from onnx to IR like Shadowpower did for wd-1.3

What if we convert the model ourselves?

RedAndr commented 1 year ago

What if we convert the model ourselves?

Yes, you can. I did it for SD2.1 and it works fine. You can even convert it with dynamic axes to set arbitrary resolution, not only 512x512. But, of course, it is way slower.

Timwi commented 1 year ago

Yes, you can. I did it for SD2.1 and it works fine.

Mind telling us how to do that?

RedAndr commented 1 year ago

Yes, you can. I did it for SD2.1 and it works fine.

Mind telling us how to do that?

Sure. Mentioned this already somewhere here. It's an official example from the OpenVINO team: https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb Although you would have to modify it for your model or if you need img2img to add the VAE encoder.

raymondlo84 commented 1 year ago

We updated the notebooks and so the converted IR will work directly with this demo.

arisha07 commented 1 year ago

@RedAndr so SD2.1 can be converted using the conversion steps mentioned in https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb ?

RedAndr commented 1 year ago

@arisha07 Yes, it's the same. You can change the resolution since SD2.1 native resolution is 768x768. It could be even higher but requires more memory. The only difference with SD1.4 is prediction_type, which must be 'v_prediction' instead of 'epsilon'.

arisha07 commented 1 year ago

Thank you so much for your response. Had few more questions - Did you keep the same tokenizer - 'openai/clip-vit-large-patch14' from before ? I thought there was a change with SD2/SD2.1 Also where did you make the prediction_type change ?

RedAndr commented 1 year ago

Yes, I did change the tokenizer to open_clip one, namely ViT-H-14, but I didn't notice a difference and revert it back. It could be just in my case, so you'd better experiment yourself. However, it requires some code changes as far as I remember.

The prediction_type is a parameter in a scheduler, for example: LMSDiscreteScheduler( ..., prediction_type='v_prediction')

arisha07 commented 1 year ago

Okay I was able to get SD2.1 converted. Few changes that I had to make in the notebook from openvino was

But when adding prediction_type='v_prediction' to the openvino's notebook the generated image looked corrupted. @RedAndr did you see the same behavior ? Also when you were using open_clip's ViT-H-14 tokenizer did you make any changes to the following lines ?

tokens = self.tokenizer(
            prompt,
            padding="max_length",
            max_length=self.tokenizer.model_max_length,
            truncation=True
        ).input_ids

Thank you once again for your help :)

RedAndr commented 1 year ago

@arisha07 I didn't use the notebook to generate any images, sorry. No idea why it doesn't work. The code looks fine to me.

Yes, I did change the parameter for the tokenizer since it's different. In fact, I removed all of them: tokenizer = open_clip.get_tokenizer('ViT-H-14') tokens = tokenizer(prompt).tolist()[0]

No problem, glad it helped.

arisha07 commented 1 year ago

@RedAndr I was just curious to know if you ever tried converting "stabilityai/stable-diffusion-2-depth" to openvino IRs ?

RedAndr commented 1 year ago

@arisha07 No, I didn't, but I guess it shouldn't be a problem.

arisha07 commented 1 year ago

I think the unet weight's shape changes with this model..so might need some modifications.

jarredwalton commented 1 year ago

@RedAndr so SD2.1 can be converted using the conversion steps mentioned in https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/225-stable-diffusion-text-to-image/225-stable-diffusion-text-to-image.ipynb ?

I tried to get this work on Arc GPUs, apparently I don't know what I'm doing with jupyter notebooks or just in general, and it didn't work. LOL. I would love for anyone to help me get OpenVINO with Arc working with Stable Diffusion 2.1 768x768 models, with a functional webui. I found this DirectML / ONNX project that actually ran on Arc... but generated garbage output. https://github.com/Amblyopius/Stable-Diffusion-ONNX-FP16

And just to be clear, I'm doing this for benchmarking and journalistic purposes. I wrote this article and I want to update it with "better" Stable Diffusion 2.1 testing. Drop me a note or email if you have working instructions for getting:

  1. Stable Diffusion 2.1 models working with Arc (in a "mostly optimal" fashion)
  2. Caching of models and everything into RAM/VRAM so initialization doesn't take so long
  3. A functional web UI similar to Automatic 1111 so I can specify prompt, negative prompt, output resolution, steps, batch size, and batch count

    Thanks! —Jarred