dbolya / tomesd

Speed up Stable Diffusion with this one simple trick!
MIT License
1.29k stars 78 forks source link

Support torch.compile #47

Open oxysoft opened 1 year ago

oxysoft commented 1 year ago

In #40 there is some discussion about supporting torch.compile and I'd like to create an issue for it in case anyone comes here looking for the same. I am working towards real-time applications, so any speed I can scrape is a massive gain for me. Must break the 1 FPS barrier!

I've made this benchmark below and would love to extend it. In particular I wonder if the speedup would be additive or multiplicative.

GPU Model Optimizations Speed (it/s)
RTX 3090 ControlNet(HED+TemporalNet+Depth) Raw 5.76
RTX 3090 ControlNet(HED+TemporalNet+Depth) TomeSD 37.5% 6.13
RTX 3090 ControlNet(HED+TemporalNet+Depth) Compile reduce-overhead 6.30
RTX 3090 ControlNet(HED+TemporalNet+Depth) Compile max-autotune 6.50