Open jon-chuang opened 9 months ago
lcm: 18.5ms -> 25.0ms.
Same story with v1.0.0+torch2.1.1+cu121+xformers0.23 nightly release is even worse: (30ms)
When I use v.1.0.0 with torch 2.1.2 and xformers0.23.post1, I do not observe this issue. So the issue is with stable-fast.
Perf is worse even with 0.0.13 not compiling vae.encode (20.5ms).
Similar regression is observed for H100.
This shouldn't happen. What's your script?
When I run python3 examples/optimize_lcm_lora.py, I still see a significant speedup improvement. So I don't know what's wrong.
python3 examples/optimize_lcm_lora.py
lcm: 18.5ms -> 25.0ms.
Same story with v1.0.0+torch2.1.1+cu121+xformers0.23 nightly release is even worse: (30ms)
When I use v.1.0.0 with torch 2.1.2 and xformers0.23.post1, I do not observe this issue. So the issue is with stable-fast.
Perf is worse even with 0.0.13 not compiling vae.encode (20.5ms).
Similar regression is observed for H100.