This extension enables the best performance on NVIDIA RTX GPUs for Stable Diffusion with TensorRT. You need to install the extension and generate optimized engines before using the extension. Please follow the instructions below to set everything up. Supports Stable Diffusion 1.5,2.1, SDXL, SDXL Turbo, and LCM. For SDXL and SDXL Turbo, we recommend using a GPU with 12 GB or more VRAM for best performance due to its size and computational intensity.
Example instructions for Automatic1111:
Happy prompting!
To use LoRA / LyCORIS checkpoints they first need to be converted to a TensorRT format. This can be done in the TensorRT extension in the Export LoRA tab.
TensorRT uses optimized engines for specific resolutions and batch sizes. You can generate as many optimized engines as desired. Types:
512 x 512
and 768x768 for Stable Diffusion 1.5 and 2.1 with batch sizes 1 to 4. For SDXL, this selection generates an engine supporting a resolution of 1024 x 1024
with a batch size of 1
.Each preset can be adjusted with the “Advanced Settings” option. More detailed instructions can be found here.
HIRES FIX: If using the hires.fix option in Automatic1111 you must build engines that match both the starting and ending resolutions. For instance, if the initial size is 512 x 512
and hires.fix upscales to 1024 x 1024
, you must generate a single dynamic engine that covers the whole range.
Resolution: When generating images, the resolution needs to be a multiple of 64. This applies to hires.fix as well, requiring the low and high-res to be divisible by 64.
Failing CMD arguments:
medvram
and lowvram
Have caused issues when compiling the engine.api
Has caused the model.json
to not be updated. Resulting in SD Unets not appearing after compilation.Driver:
Linux: >= 450.80.02
We always recommend keeping the driver up-to-date for system wide performance improvements.