Open szokolai-mate opened 1 year ago
Yes, it shouldn’t be too hard, however, there is no flexibility if you use trt.
Basically all current techniques of “machine learning compilation” requires static computation graph, meaning that you cannot change width & height, cannot change LoRA, cannot change ControlNet, cannot change AnimateDiff, etc. Everytime you make a change of the above, you will have to wait for a long time to have a new compilation finished. It is not a simple problem.
I would agree that a “production ready” company should use trt, for example, OpenAI. Dall-E has very limited parameter options, and they can pre-compile several models for each of these options. An individual who prefer to stick to the same workflow would also benefit from trt. If you frequently change parameters, you won’t get too much benefit from trt.
Probably the biggest barrier would be lack of support for ControlNet in TensorRT extension. It really does allow for great things in AnimateDiff, so not having that option for extra performance doesn't sound like such a great deal.
I understand that there is a huge drawback of inflexibility, but I would argue that those who want to use TRT are aware of the trade-off. For example, I would use it for mass-generation after dialing in the parameters.
The issue I've linked also indicates that quite a few people would love to have this compatibility, so thank you for your consideration!
The speedup TensorRT provides is worth the tradeoffs for my slower card (2070S). It's literally a 40-60% speed boost for me over xformers and that would be amazing to use with AnimateDiff.
cannot change width & height, cannot change LoRA
Latest TensorRT extension allows dynamic changing of width and height. The TRT models I've exported dynamically allow width and height between 512 and 1280, and allow batches up to 4. I've also exported models that support batches of 16 for AnimateDiff, but alas, it currently doesn't work.
Here's a before and after when generating a 512x640 image then using hi-res fix to double the size to 1024x1280 with these dynamic models (on a 4080):
Before TensorRT
After
LoRAs are also being handled a bit better now. You have to export the LoRAs you want to use, but only once instead of for every model.
@szokolai-mate
Hi, friend!I know you are suffering great pain from using TRT
with diffusers
.
So why not choose my totally open-sourced alternative: stable-fast
?
It's on par with TRT
on inference speed, faster than torch.compile
and AITemplate
, and is super dynamic and flexible, supporting ALL SD models and LoRA and ControlNet out of the box!
Expected behavior
Using the https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT extension, AnimateDiff generation should work normally.
Current behavior: Using a TensorRT engine with batch size 8, all the frames seem to be just new generations with no shared context.
Benefit: Significant (x2-x8) potential speed up and reduced VRAM usage for NVidia RTX users.
Additional context: a) The maintainer of the TensorRT extension is of the opinion that "it shouldn't be too hard" it needs to be done in the AnimateDiff extension. b) Currently ControlNet and TensorRT are incompatible, but getting them to work is "top priority". I am not sure if this is a prerequisite for AnimateDiff.