Open mikecarnohan opened 5 months ago
To clarify, running SD without TensorRT works fine. SD-Turbo works fine. And other optimized (tensor-based) models work fine. The memory problem only comes up under SD + TensorRT builds.
RTX 2050 ouch... maybe not possible? [for context, i barely ran this on 6gb vram (1660)] afaik ONNX/TRT has a 4gb (?) minimum free vram so depending on your setup it might literally be impossible
[copypasted from touhouAI]:
for 4GB users, lmao it wont work
I have a build with everything needed for SD+TRT is installed, but when I run SD with TensorRT enabled, I encounter memory issues.
I believe it's possible for these issues to be gotten around by using an ONNX converter (per @yoinked-h on GitHub).
You need to remove the cuda imports temporarily from trt.py in /scripts In export_onnx.py you need to replace device.devices to "cpu", devices.dtype to torch.float and remove "with devices.autocast():" |
But at this point, with a vanilla build of SD + TensorRT (10.0), I get the following error:
If anyone has gotten further, and can share the way it's working, it would be greatly appreciated. I'd like to be able to use my pre-production laptop to write TouchDesigner code without having to be tethered to my production machine. And even though I don't need to see super high frame rates in those cases, it would be nice to get past 2fps (frame rate without TensorRT on my machine), to know what noise map setting and step scheduling settings will work best when I ship my project files if possible.