AUTOMATIC1111 / stable-diffusion-webui

Stable Diffusion web UI
GNU Affero General Public License v3.0
140.95k stars 26.66k forks source link

[Feature Request]: Potential mode for TensorRT inference over native PyTorch #7345

Open elliotgsy opened 1 year ago

elliotgsy commented 1 year ago

Is there an existing issue for this?

What would your feature do ?

As shown on the the SDA: Node repository. Supported NVIDIA systems can achieve inference speeds up to x4 over native pytorch utilising NVIDIA TensorRT.

Their demodiffusion.py file and text to image file (t2i.py) provides a good example of how this is used. This is mainly based off the Nvidia diffusion demo folder.

This speedup happens through compiling it into a highly optimised version that can be run on Nvidia GPUs, e.g. https://huggingface.co/tensorrt/Anything-V3/tree/main

This splits out the CLIP, UNET and VAE into .plan files, these are serialized TensorRT engine files which contain the parameters of the optimized model. Running these through the TensorRT runtime provides additional restrictions on resolution and batch size as talked about on the SDA: Node README.

Usage is dependent on the following (https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html):

Examples of Implementations:

I've managed to build sda-node for Linux and test TensorRT on Windows and can confirm around a ~x3 speedup on my own system compared to inference in AUTOMATIC-1111.

Implementation would be dependent on loading model(s) .plan files into a runtime with the TensorRT engine and plugins loaded, including synchronizing CUDA and PyTorch, etc.

I'm unsure if this would be in-scope for this project. Potentially too much stuff that would not run on this runtime, meaning it'd make more sense to just use a separate UI when wanting to utilise NVIDIA TensorRT models. People may find a number of features they'd expect to work no longer work on the tensor runtime. It's also another mode of operation that is hard to support and will need to be maintained. Just proposing in a cleaner way before anyone else does, it could be this is already in the works :-)

Proposed workflow

As I'm unsure of this project's structure and goals. I'm unsure of the viability or implementation path.

Additional information

I'll likely be messing around with my own a bit with this, but I have limited knowledge with pytorch/stable diffusion/cuda, so I wouldn't expect any MR from myself.

chavinlo commented 1 year ago

Hello. May I ask how did you got it to run on windows? Were the OSS plugins required?

NikkMann commented 1 year ago

I'm curious if this would affect the actual generated image, since it says it optimizes CLIP which can really change how a gen turns out depending on the value. Would still love to have as an option though. 60 steps in under a second is crazy.

elliotgsy commented 1 year ago

Note that a working version of inference with TensorRT on windows is shown here: https://github.com/ddPn08/Lsmith

Another note is that these plan files are built to work specifically to your gpu/libraries in use, so users need to give each model they want to use ~10mins of time to let it build and compile into a .plan file.

hukk06 commented 1 year ago

Note that a working version of inference with TensorRT on windows is shown here: https://github.com/ddPn08/Lsmith

Another note is that these plan files are built to work specifically to your gpu/libraries in use, so users need to give each model they want to use ~10mins of time to let it build and compile into a .plan file.

I installed the docker release of Lsmith and i can confirm, it works well. Lacks most features automatic1111's UI has. But BOY is it fast. If i knew how to help to implement, i would help. If there are some tasks that i could do to help, i'd get on it of my free time. Because when running batches on a1111 i got ~15 it/s. On Lsmith running single image generations i got 35 it/s. This would be a big improvement if it got implemented. For now i wont be using Lsmith because it's quite raw. There are resolution limits (1024x1024), no hires fix or face restore. https://i.imgur.com/5FDD7Yb.png

Haven't tried windows native install yet, but will do as soon as i get time.

official-elinas commented 1 year ago

Still interest in this? I definitely am.

Nyaster commented 1 year ago

Yes, i am trying and mostly its providing x2 speed on my laptot 3070

bropines commented 1 year ago

@AUTOMATIC1111 Would it be possible to add this to the current code?

Sakura-Luna commented 1 year ago

Has anyone compared TensorRT and Olive (Direct-ML)? They advertise similar performance.