Closed DhruvThu closed 1 year ago
Maybe this issue will help - https://github.com/LaurentMazare/diffusers-rs/issues/1
Did you try running with autocast mode on and with fp16 weights? I think it's likely to be the default on the Python side, on the rust side you may want to use the --autocast
flag to do this (though I haven't tested it on stable diffusion 2.1 as my gpu only has 8GB of memory which is not enough even with fp16).
Thank you for your suggestions. I have tried with the autocast feature. I got results in 9-10s. Is there any way to reduce inference time more?
Also one more thing, I am really sorry for above stats. Its incorrect because I was confused with some other results. Actually Rust SD took around 12-13s to generate image. Whereas normal SD pipeline took around 7-8s to generate image.
Thank you for your suggestions. I have tried with the autocast feature. I got results in 9-10s. Is there any way to reduce inference time more?
Quite soon, there will supposedly be "Distilled Stable Diffusion" that should reduce inference time by at least 20x, maybe even more:
https://twitter.com/EMostaque/status/1598131202044866560
The numbers are a bit confusing, but I think he means it's a 20x speedup in time per step, and additionally also only needing 1-4 steps for a good image. So in total more like a 100x speedup compared to now.
Obviously I have no idea when exactly that will be available and how soon it can be implemented in this Rust version, but I hope it will be ideal for anyone who needs fast inference speed.
Thank you for your suggestion. I will surely check them out once its available.
I have tried with rust implementation of Stable diffusion v2 on A100 gpu with 40gb of ram. Normal stable diffusion pipeline from huggingface takes around 7-8s to generate an image whereas rust implementation takes around 12-13s. It will be really helpful if someone can explain that why is huggingface taking less time compared to rust implementation or am I missing something while running rust implementation?
Thanks!!