NVIDIA / Stable-Diffusion-WebUI-TensorRT

TensorRT Extension for Stable Diffusion Web UI
MIT License
1.84k stars 139 forks source link

Failed to build engine #284

Open CamiloMM opened 4 months ago

CamiloMM commented 4 months ago

Dunno what's the problem, VRAM seems to be all right:

image

Logs:

Disabling attention optimization
Exporting Anything-V3.0-pruned-fp32 to TensorRT using - Batch Size: 1-1-4
Height: 512-512-768
Width: 512-512-768
Token Count: 75-75-150
Disabling attention optimization
Building TensorRT engine... This can take a while, please check the progress in the terminal.
Building TensorRT engine for E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\models\Unet-onnx\Anything-V3.0-pruned-fp32.onnx: E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\models\Unet-trt\Anything-V3.0-pruned-fp32_1a7df6b8_cc86_sample=1x4x64x64+2x4x64x64+8x4x96x96-timesteps=1+2+8-encoder_hidden_states=1x77x768+2x77x768+8x154x768.trt
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:81] The total number of bytes read was 1721606472
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING **************************************************************************\externals\protobuf\3.0.0\src\google\protobuf\io\coded_stream.cc:81] The total number of bytes read was 1721606472
Building engine:  50%|█████████████████████████████████▌                                 | 3/6 [00:00<00:00,  9.42it/s][W] 'colored' module is not installed, will not use colors when logging. To enable colors, please install the 'colored' module: python3 -m pip install colored                                                            | 0/5 [00:00<?, ?it/s]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)4%|██▏                                                          | 17/473 [00:03<01:25,  5.32it/s]
[E] 2: [virtualMemoryBuffer.cpp::nvinfer1::StdVirtualMemoryBufferImpl::resizePhysical::140] Error Code 2: OutOfMemory (no further information)
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 83892733-byte buffer
[E] 1: [defaultAllocator.cpp::nvinfer1::internal::DefaultAllocator::allocate::20] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 9: Skipping tactic0x0000000000000000 due to exception [::0] autotuning: User allocator error allocating 83892733-byte buffer
Building engine:  50%|█████████████████████████████████▌                                 | 3/6 [00:11<00:11,  3.69s/it][E] 10: Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_10455 + ONNXTRT_Broadcast_742.../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.
Building engine: 100%|███████████████████████████████████████████████████████████████████| 6/6 [00:11<00:00,  1.85s/it]
[E] 1: [cudaResources.cpp::nvinfer1::ScopedCudaStream::~ScopedCudaStream::47] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] 10: [optimizer.cpp::nvinfer1::builder::cgraph::LeafCNode::computeCosts::4040] Error Code 10: Internal Error (Could not find any implementation for node {ForeignNode[onnx::LayerNormalization_10455 + ONNXTRT_Broadcast_742.../input_blocks.1/input_blocks.1.1/Reshape_2 + /input_blocks.1/input_blocks.1.1/Transpose_1 + /input_blocks.1/input_blocks.1.1/Reshape_3]}.)
[!] Invalid Engine. Please ensure the engine was built correctly
ERROR:root:Failed to build engine: Invalid Engine. Please ensure the engine was built correctly
Traceback (most recent call last):
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 488, in run_predict
    output = await app.get_blocks().process_api(
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1431, in process_api
    result = await self.call_function(
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1103, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\utils.py", line 707, in wrapper
    response = f(*args, **kwargs)
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\ui_trt.py", line 126, in export_unet_to_trt
    ret = export_trt(
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\extensions\Stable-Diffusion-WebUI-TensorRT\exporter.py", line 243, in export_trt
    shared.sd_model = model.cuda()
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 73, in cuda
    return super().cuda(device=device)
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "E:\MachineLearning\Stable Diffusion\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 905, in <lambda>
    return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.