Dunno what's the problem, VRAM seems to be all right:
Logs:
Disabling attention optimization
Exporting Anything-V3.0-pruned-fp32 to TensorRT using - Batch Size: 1-1-4
Height: 512-512-768
Width: 512-512-768
Token Count: 75-75-150
Disabling attention optimization
Building TensorRT engine... This can take a while, please check the progress in the terminal.
Building TensorRT engine for E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\models\Unet-onnx\Anything-V3.0-pruned-fp32.onnx: E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\models\Unet-trt\Anything-V3.0-pruned-fp32_1a7df6b8_cc86_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:81] The total number of bytes read was 1721606472
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:604] Reading dangerously large protocol message. If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons. To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:81] The total number of bytes read was 1721606472
Building engine: 50%|█████████████████████████████████▌ | 3/6 [00:00<00:00, 9.42it/s][W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored | 0/5 [00:00<?, ?it/s]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)4%|██▏ | 17/473 [00:03<01:25, 5.32it/s]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 83892733-byte buffer
[E] 1: [defaultAllocator.cpp::nvinfer1::internal::DefaultAllocator::allocate::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 83892733-byte buffer
Building engine: 50%|█████████████████████████████████▌ | 3/6 [00:11<00:11, 3.69s/it][E] 10: Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_10455 + ONNXTRT_Broadcast_742.../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.
Building engine: 100%|███████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00, 1.85s/it]
[E] 1: [cudaResources.cpp::nvinfer1::ScopedCudaStream::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 10: [optimizer.cpp::nvinfer1::builder::cgraph::LeafCNode::computeCosts::4040] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_10455 + ONNXTRT_Broadcast_742.../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.)
[!] Invalid Engine. Please ensure the engine was built correctly
ERROR:root:Failed to build engine: Invalid Engine. Please ensure the engine was built correctly
Traceback (most recent call last):
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
output = await app.get_blocks().process_api(
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
result = await self.call_function(
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
prediction = await anyio.to_thread.run_sync(
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_asynclib().run_sync_in_worker_thread(
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
return await future
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
result = context.run(func, *args)
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
response = f(*args, **kwargs)
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 126, in export_unet_to_trt
ret = export_trt(
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 243, in export_trt
shared.sd_model = model.cuda()
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 73, in cuda
return super().cuda(device=device)
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
return self._apply(lambda t: t.cuda(device))
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
param_applied = fn(param)
File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in <lambda>
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Dunno what's the problem, VRAM seems to be all right:
Logs: