Khushboodholi commented 1 year ago

I am looking to get the models in 16bit, currently I see its only 32bit.

RedAndr commented 1 year ago

Only the most recent Intel CPUs support bfloat16.

brmarkus commented 1 year ago

FP32 versus FP16 - not BFLOAT16 ;-)

RedAndr commented 1 year ago

OpenVINO doesn't have FP16 at all for CPU. So, even if a model is in FP16 it will be calculated in FP32 anyway. Frankly, I see no point in having FP16 models in that case.

brmarkus commented 1 year ago

A model in FP16 can be used with the OpenVINO CPU-plugin as well as with other plugins/for other devices (like a VisionProcessingUnit VPU). Wanting to use FP16 over FP32 can have other reasons as well.

raymondlo84 commented 1 year ago

We can get meaningful acceleration on dGPU, iGPU if we use FP16. Where is the script for converting the model to IR? I can help compiling and testing that too.

RedAndr commented 1 year ago

Can you load the FP32 model and use FP16 for calculations? No problem.

raymondlo84 commented 1 year ago

https://huggingface.co/raymondlo84/stable-diffusion-v1-4-openvino-fp16

We have created this for the community. We are getting a significant speed up on A770m (~1.8 it/s -> ~6.6 it/s), and it's now 1/2 of the model size and use much less VRAM.

You can try this without any code changes. But if you want to use the GPU, you have to change the device = "GPU" or "GPU.1" in the "stable_diffusion_engine.py" if you have multiple GPUs (iGPU + dGPU) like my setup.

class StableDiffusionEngine: def init( self, scheduler, model="bes-dev/stable-diffusion-v1-4-openvino", tokenizer="openai/clip-vit-large-patch14", device="GPU" ):

python demo.py --prompt "tree house" --model raymondlo84/stable-diffusion-v1-4-openvino-fp16

We also have a notebook that teaches how we convert, optimize, and also run these with OpenVINO. Check it out. https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/225-stable-diffusion-text-to-image

and new pull request is coming to enable the image-to-image too.

https://github.com/openvinotoolkit/openvino_notebooks/pull/805

Special thanks and credit: Ekaterina

Cheers

raymondlo84 commented 1 year ago

Can you load the FP32 model and use FP16 for calculations? No problem.

In 2023.0 version yes. But having FP16 also reduce the model size significantly.

raymondlo84 commented 1 year ago

https://huggingface.co/bes-dev/stable-diffusion-v1-4-openvino/discussions/4 Made a pull request to the main repos, and now it will use FP16. Hope I didn't break anything :)

RedAndr commented 1 year ago

Can you load the FP32 model and use FP16 for calculations? No problem.

In 2023.0 version yes. But having FP16 also reduce the model size significantly.

Yep, exactly twice ;) Say, from 4GB to 2GB, which is not a big deal at least to me.

Just worrying if FP16 usage would lead to precision loss. Although frankly, I couldn't find much of a difference between FP16 and FP32 in my experiments, which looks odd to me. It seems the initial SD model has been generated with FP16 already.

bes-dev / stable_diffusion.openvino

Do u have the model in 16 bit ? #116

and new pull request is coming to enable the image-to-image too.