Open Khushboodholi opened 1 year ago
Only the most recent Intel CPUs support bfloat16.
FP32 versus FP16 - not BFLOAT16 ;-)
OpenVINO doesn't have FP16 at all for CPU. So, even if a model is in FP16 it will be calculated in FP32 anyway. Frankly, I see no point in having FP16 models in that case.
A model in FP16 can be used with the OpenVINO CPU-plugin as well as with other plugins/for other devices (like a VisionProcessingUnit VPU). Wanting to use FP16 over FP32 can have other reasons as well.
We can get meaningful acceleration on dGPU, iGPU if we use FP16. Where is the script for converting the model to IR? I can help compiling and testing that too.
Can you load the FP32 model and use FP16 for calculations? No problem.
https://huggingface.co/raymondlo84/stable-diffusion-v1-4-openvino-fp16
We have created this for the community. We are getting a significant speed up on A770m (~1.8 it/s -> ~6.6 it/s), and it's now 1/2 of the model size and use much less VRAM.
You can try this without any code changes. But if you want to use the GPU, you have to change the device = "GPU" or "GPU.1" in the "stable_diffusion_engine.py" if you have multiple GPUs (iGPU + dGPU) like my setup.
class StableDiffusionEngine: def init( self, scheduler, model="bes-dev/stable-diffusion-v1-4-openvino", tokenizer="openai/clip-vit-large-patch14", device="GPU" ):
python demo.py --prompt "tree house" --model raymondlo84/stable-diffusion-v1-4-openvino-fp16
We also have a notebook that teaches how we convert, optimize, and also run these with OpenVINO. Check it out. https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/225-stable-diffusion-text-to-image
https://github.com/openvinotoolkit/openvino_notebooks/pull/805
Special thanks and credit: Ekaterina
Cheers
Can you load the FP32 model and use FP16 for calculations? No problem.
In 2023.0 version yes. But having FP16 also reduce the model size significantly.
https://huggingface.co/bes-dev/stable-diffusion-v1-4-openvino/discussions/4 Made a pull request to the main repos, and now it will use FP16. Hope I didn't break anything :)
Can you load the FP32 model and use FP16 for calculations? No problem.
In 2023.0 version yes. But having FP16 also reduce the model size significantly.
Yep, exactly twice ;) Say, from 4GB to 2GB, which is not a big deal at least to me.
Just worrying if FP16 usage would lead to precision loss. Although frankly, I couldn't find much of a difference between FP16 and FP32 in my experiments, which looks odd to me. It seems the initial SD model has been generated with FP16 already.
I am looking to get the models in 16bit, currently I see its only 32bit.