Open BlueAccords opened 1 year ago
Yes it's planned. There's a few more things I want to implement before. Unless something better comes out in the meantime I'm going to implement it.
Be prepared to pull your hair out if you get to it. Getting it setup on one machine is a chore, but getting it to work on someone else's is worse. Many TensorRT implementations have fallen because of it. Volta-ML and SDA-node are the larger examples.
I would love to see tensorrt support, because SD XL is quite slow. Also, tensorrt only seems to support max 768x768px. Do you think it is somehow possible to pass the SDXL 1024 in?
There was a pull for automatic that references the limits of TRT.
https://github.com/AUTOMATIC1111/stable-diffusion-webui-tensorrt/pull/36
There is an upper limit of what it can do. As an example, if you have your batch size set to 8, you may not be able to generate dynamic images greater than 512x480 and the like.
I took note of some issues I've found during the year of TRT implementations flooding GitHub.
https://github.com/AUTOMATIC1111/stable-diffusion-webui-tensorrt/issues/46#issuecomment-1644403272
Max dimensions I got from the calculated limit is Batch size 1, 858 x 858.
Yeah I'm going to go with AITemplate instead of TRT unless they add a way to replace the weights at runtime.
Yeah AiTemplate seems good too.
SDXL models now have a TensorRT variant. https://huggingface.co/stabilityai/stable-diffusion-xl-1.0-tensorrt
yeah its there! any idea how make it work in comfyui
Aaaand new drivers just dropped promising 2X performance boost with TRT https://www.nvidia.com/en-us/geforce/news/game-ready-driver-dlss-3-naraka-vermintide-rtx-vsr/
Still has the aforementioned issues with being dynamic (size, maximums, minimums).
But, it's at least an official implementation for A1111. Hopefully that means it's working now, and on windows.
On Tue, Oct 17, 2023, 10:23 AM Erehr @.***> wrote:
Aaaand new drivers just dropped promising 2X performance boost with TRT
https://www.nvidia.com/en-us/geforce/news/game-ready-driver-dlss-3-naraka-vermintide-rtx-vsr/
— Reply to this email directly, view it on GitHub https://github.com/comfyanonymous/ComfyUI/issues/29#issuecomment-1766854168, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI7OOITKIRYIJOLHOCKSSLLX725H7AVCNFSM6AAAAAAVIY5CK6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRWHA2TIMJWHA . You are receiving this because you commented.Message ID: @.***>
Aaaand new drivers just dropped promising 2X performance boost with TRT https://www.nvidia.com/en-us/geforce/news/game-ready-driver-dlss-3-naraka-vermintide-rtx-vsr/
and with that I'd love to see the fruits of this 2x speed boost as that could greatly improve my workflow on my 3090.
Ok so how do i translate that support instructions for auto1111 to compfy. As I was using auto1111 but I was getting issues creating and using the optimized versions
advanced users can try my node: https://github.com/phineas-pta/comfy-trt-test
I just tested Nvidia's A1111 TRT extension and results are 2x faster indeed (at least for 512x768 simple generation).
@BlueAccords Have you tried this? https://github.com/comfyanonymous/ComfyUI_TensorRT
I think I remember somewhere you were looking into supporting tensortRT models, is that still in the backlog somewhere? or would implementing support for tensortRT require too much rework of the existing codebase?