continue-revolution / sd-webui-animatediff

AnimateDiff for AUTOMATIC1111 Stable Diffusion WebUI
Other
3.08k stars 256 forks source link

[Feature]: Compatibility with TensorRT #259

Open szokolai-mate opened 1 year ago

szokolai-mate commented 1 year ago

Expected behavior

Using the https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT extension, AnimateDiff generation should work normally.

Current behavior: Using a TensorRT engine with batch size 8, all the frames seem to be just new generations with no shared context.

Benefit: Significant (x2-x8) potential speed up and reduced VRAM usage for NVidia RTX users.

Additional context: a) The maintainer of the TensorRT extension is of the opinion that "it shouldn't be too hard" it needs to be done in the AnimateDiff extension. b) Currently ControlNet and TensorRT are incompatible, but getting them to work is "top priority". I am not sure if this is a prerequisite for AnimateDiff.

continue-revolution commented 1 year ago

Yes, it shouldn’t be too hard, however, there is no flexibility if you use trt.

Basically all current techniques of “machine learning compilation” requires static computation graph, meaning that you cannot change width & height, cannot change LoRA, cannot change ControlNet, cannot change AnimateDiff, etc. Everytime you make a change of the above, you will have to wait for a long time to have a new compilation finished. It is not a simple problem.

I would agree that a “production ready” company should use trt, for example, OpenAI. Dall-E has very limited parameter options, and they can pre-compile several models for each of these options. An individual who prefer to stick to the same workflow would also benefit from trt. If you frequently change parameters, you won’t get too much benefit from trt.

anwoflow commented 1 year ago

Probably the biggest barrier would be lack of support for ControlNet in TensorRT extension. It really does allow for great things in AnimateDiff, so not having that option for extra performance doesn't sound like such a great deal.

szokolai-mate commented 12 months ago

I understand that there is a huge drawback of inflexibility, but I would argue that those who want to use TRT are aware of the trade-off. For example, I would use it for mass-generation after dialing in the parameters.

The issue I've linked also indicates that quite a few people would love to have this compatibility, so thank you for your consideration!

Tybost commented 12 months ago

The speedup TensorRT provides is worth the tradeoffs for my slower card (2070S). It's literally a 40-60% speed boost for me over xformers and that would be amazing to use with AnimateDiff.

ManOrMonster commented 11 months ago

cannot change width & height, cannot change LoRA

Latest TensorRT extension allows dynamic changing of width and height. The TRT models I've exported dynamically allow width and height between 512 and 1280, and allow batches up to 4. I've also exported models that support batches of 16 for AnimateDiff, but alas, it currently doesn't work.

Here's a before and after when generating a 512x640 image then using hi-res fix to double the size to 1024x1280 with these dynamic models (on a 4080):

Before TensorRT image

After image

LoRAs are also being handled a bit better now. You have to export the LoRAs you want to use, but only once instead of for every model.

chengzeyi commented 11 months ago

@szokolai-mate Hi, friend!I know you are suffering great pain from using TRT with diffusers.

So why not choose my totally open-sourced alternative: stable-fast? It's on par with TRT on inference speed, faster than torch.compile and AITemplate, and is super dynamic and flexible, supporting ALL SD models and LoRA and ControlNet out of the box!