comfyanonymous / ComfyUI

The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface.
https://www.comfy.org/
GNU General Public License v3.0
52.42k stars 5.53k forks source link

[Feature Request] TensorRT support #29

Open BlueAccords opened 1 year ago

BlueAccords commented 1 year ago

I think I remember somewhere you were looking into supporting tensortRT models, is that still in the backlog somewhere? or would implementing support for tensortRT require too much rework of the existing codebase?

comfyanonymous commented 1 year ago

Yes it's planned. There's a few more things I want to implement before. Unless something better comes out in the meantime I'm going to implement it.

78Alpha commented 1 year ago

Be prepared to pull your hair out if you get to it. Getting it setup on one machine is a chore, but getting it to work on someone else's is worse. Many TensorRT implementations have fallen because of it. Volta-ML and SDA-node are the larger examples.

CyberTimon commented 1 year ago

I would love to see tensorrt support, because SD XL is quite slow. Also, tensorrt only seems to support max 768x768px. Do you think it is somehow possible to pass the SDXL 1024 in?

78Alpha commented 1 year ago

There was a pull for automatic that references the limits of TRT.

https://github.com/AUTOMATIC1111/stable-diffusion-webui-tensorrt/pull/36

There is an upper limit of what it can do. As an example, if you have your batch size set to 8, you may not be able to generate dynamic images greater than 512x480 and the like.

I took note of some issues I've found during the year of TRT implementations flooding GitHub.

https://github.com/AUTOMATIC1111/stable-diffusion-webui-tensorrt/issues/46#issuecomment-1644403272

Max dimensions I got from the calculated limit is Batch size 1, 858 x 858.

comfyanonymous commented 1 year ago

Yeah I'm going to go with AITemplate instead of TRT unless they add a way to replace the weights at runtime.

CyberTimon commented 1 year ago

Yeah AiTemplate seems good too.

nistvan86 commented 1 year ago

SDXL models now have a TensorRT variant. https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt

al-swaiti commented 1 year ago

yeah its there! any idea how make it work in comfyui

Erehr commented 11 months ago

Aaaand new drivers just dropped promising 2X performance boost with TRT https://www.nvidia.com/en-us/geforce/news/game-ready-driver-dlss-3-naraka-vermintide-rtx-vsr/

78Alpha commented 11 months ago

Still has the aforementioned issues with being dynamic (size, maximums, minimums).

But, it's at least an official implementation for A1111. Hopefully that means it's working now, and on windows.

On Tue, Oct 17, 2023, 10:23 AM Erehr @.***> wrote:

Aaaand new drivers just dropped promising 2X performance boost with TRT

https://www.nvidia.com/en-us/geforce/news/game-ready-driver-dlss-3-naraka-vermintide-rtx-vsr/

— Reply to this email directly, view it on GitHub https://github.com/comfyanonymous/ComfyUI/issues/29#issuecomment-1766854168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7OOITKIRYIJOLHOCKSSLLX725H7AVCNFSM6AAAAAAVIY5CK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRWHA2TIMJWHA . You are receiving this because you commented.Message ID: @.***>

BassJMagan commented 11 months ago

Aaaand new drivers just dropped promising 2X performance boost with TRT https://www.nvidia.com/en-us/geforce/news/game-ready-driver-dlss-3-naraka-vermintide-rtx-vsr/

and with that I'd love to see the fruits of this 2x speed boost as that could greatly improve my workflow on my 3090.

xueqing0622 commented 11 months ago

support! https://github.com/NVIDIA/Stable-Diffusion-WebUI-TensorRT

SplitMilky commented 11 months ago

Ok so how do i translate that support instructions for auto1111 to compfy. As I was using auto1111 but I was getting issues creating and using the optimized versions

phineas-pta commented 10 months ago

advanced users can try my node: https://github.com/phineas-pta/comfy-trt-test

Dunc4n1dah0 commented 10 months ago

I just tested Nvidia's A1111 TRT extension and results are 2x faster indeed (at least for 512x768 simple generation).

robinjhuang commented 3 months ago

@BlueAccords Have you tried this? https://github.com/comfyanonymous/ComfyUI_TensorRT