Regarding Inference Time

LaurentMazare / diffusers-rs

An implementation of the diffusers api in Rust

Apache License 2.0

539 stars 55 forks source link

Regarding Inference Time #20

Closed DhruvThu closed 1 year ago

DhruvThu commented 1 year ago

I have tried with rust implementation of Stable diffusion v2 on A100 gpu with 40gb of ram. Normal stable diffusion pipeline from huggingface takes around 7-8s to generate an image whereas rust implementation takes around 12-13s. It will be really helpful if someone can explain that why is huggingface taking less time compared to rust implementation or am I missing something while running rust implementation?

Thanks!!

sssemil commented 1 year ago

Maybe this issue will help - https://github.com/LaurentMazare/diffusers-rs/issues/1

LaurentMazare commented 1 year ago

Did you try running with autocast mode on and with fp16 weights? I think it's likely to be the default on the Python side, on the rust side you may want to use the --autocast flag to do this (though I haven't tested it on stable diffusion 2.1 as my gpu only has 8GB of memory which is not enough even with fp16).

DhruvThu commented 1 year ago

Thank you for your suggestions. I have tried with the autocast feature. I got results in 9-10s. Is there any way to reduce inference time more?

Also one more thing, I am really sorry for above stats. Its incorrect because I was confused with some other results. Actually Rust SD took around 12-13s to generate image. Whereas normal SD pipeline took around 7-8s to generate image.

JohnAlcatraz commented 1 year ago

Thank you for your suggestions. I have tried with the autocast feature. I got results in 9-10s. Is there any way to reduce inference time more?

Quite soon, there will supposedly be "Distilled Stable Diffusion" that should reduce inference time by at least 20x, maybe even more:

https://twitter.com/EMostaque/status/1598131202044866560

The numbers are a bit confusing, but I think he means it's a 20x speedup in time per step, and additionally also only needing 1-4 steps for a good image. So in total more like a 100x speedup compared to now.

Obviously I have no idea when exactly that will be available and how soon it can be implemented in this Rust version, but I hope it will be ideal for anyone who needs fast inference speed.

DhruvThu commented 1 year ago

Thank you for your suggestion. I will surely check them out once its available.