ONNX with TensorRT fails to run

Andryshik345 commented 7 months ago

Information:

Chainner version: 0.22.2
OS: Windows 10 10.0.19045.4170
CPU: AMD Ryzen 5 5600
GPU: RTX 3060
CUDA: 12.4
TensorRT: 10.0.0.6

Description I'm getting TypeError: Failed to fetch when trying to run upscaling with ONNX and TensorRT. Of course I'm aware that provided onnxruntime-gpu 1.15.1 doesn't support CUDA 12.x, so I manually updated it to 1.17 with CUDA 12.x support with pip using chainner's python distro. As well as installed TensorRT's wheels same way. CUDA runner works fine (although it much slower than even CPU processing).

Logs logs.zip

UffernKur commented 7 months ago

From the 3/29/2024 Nightly. nVidia RTX 3060. Fresh install of ChaiNNer, deleted and redownloaded python, onnx and nccn environment.

Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values: • Image: RGB Image 1280x1024 • Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric' • Tile Size: 256 • Separate Alpha: No

Stack Trace:
Traceback (most recent call last):
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\process.py", line 155, in run_node
    raw_output = node.run(context, *enforced_inputs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node
    return convenient_upscale(
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale
    return upscale(img)
           ^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\image_op.py", line 18, in <lambda>
    return lambda i: np.clip(op(i), 0, 1)
                             ^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in <lambda>
    lambda i: upscale(i, session, tile_size, change_shape, exact_size),
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale
    return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split
    return auto_split(img, upscale, tiler)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split
    return split(
           ^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split
    upscale_result = upscale(padded_tile.read_from(img), padded_tile)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-03-29\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale
    output: np.ndarray = session.run([output_name], {input_name: lr_img})[0]
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

joeyballentine commented 7 months ago

Does this happen with every model?

Andryshik345 commented 7 months ago

@joeyballentine I tried two different models just now and I get the same error. I also decided to record logs for the whole process (converting from pytorch model to onnx and try to upscale an image using this model), so maybe the problem arises somewhere there.

logs.zip

UffernKur commented 6 months ago

GPU is failing to run on multiple ONNX models, however those models will complete much more slowly with CPU.

joeyballentine commented 6 months ago

Make sure to try the nightly, we fixed an onnx issue there. And when you do, make sure to update onnx

Andryshik345 commented 6 months ago

@joeyballentine tried nightly build from 2024-04-07, extracted to a separate folder, installed all dependencies and now i get Error: An unexpected error occurred: Error: The application encountered an unexpected error and could not continue. Tried with several models (including those that I have already successfully used in AnimeJaNaiConverterGui with TensorRT), same error with all of them.

captured just after the first error: logs.zip

captured after some time (it begins to spam another errors regarding localhost): logs_1.zip

joeyballentine commented 6 months ago

Sorry but could you try this again on tonight's nightly when it comes out? We had an issue where some things arent logging properly, so the important logs that would tell us whats going wrong are currently being missed. Thanks

Andryshik345 commented 6 months ago

Updated a new nightly build on top of yesterday's, still the same error. And just like yesterday: captured just after the first error: logs.zip

captured after some time (it begins to spam another errors regarding localhost): logs_11.zip

I also just tried to delete everything and install it again, but no changes.

joeyballentine commented 6 months ago

Damn, whatever's going wrong still isn't logging. Are you sure you have tensorrt set up properly and added to your path env var?

And just to be sure, CUDA works fine for you?

UffernKur commented 6 months ago

Here is the CUDA failure Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values: • Image: RGB Image 1074x1515 • Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric' • Tile Size: 256 • Separate Alpha: No

Stack Trace: Traceback (most recent call last): File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\process.py", line 155, in run_node raw_output = node.run(context, *enforced_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node return convenient_upscale( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale return upscale(img) ^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\image_op.py", line 18, in return lambda i: np.clip(op(i), 0, 1) ^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in lambda i: upscale(i, session, tile_size, change_shape, exact_size), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split return auto_split(img, upscale, tiler) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split return split( ^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split upscale_result = upscale(padded_tile.read_from(img), padded_tile) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale output: np.ndarray = session.run([output_name], {input_name: lr_img})[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Here is the Tensor Failure

Error

An error occurred in a onnx Upscale Image node:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Input values: • Image: RGB Image 1074x1515 • Model: Value of type 'nodes.impl.onnx.model.OnnxGeneric' • Tile Size: 256 • Separate Alpha: No

Stack Trace: Traceback (most recent call last): File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\process.py", line 155, in run_node raw_output = node.run(context, *enforced_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 124, in upscale_image_node return convenient_upscale( ^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\convenient_upscale.py", line 58, in convenient_upscale return upscale(img) ^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\image_op.py", line 18, in return lambda i: np.clip(op(i), 0, 1) ^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 128, in lambda i: upscale(i, session, tile_size, change_shape, exact_size), ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\packages\chaiNNer_onnx\onnx\processing\upscale_image.py", line 51, in upscale return onnx_auto_split(img, session, change_shape=change_shape, tiler=tiler) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 103, in onnx_auto_split return auto_split(img, upscale, tiler) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 45, in auto_split return split( ^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\upscale\auto_split.py", line 174, in _max_split upscale_result = upscale(padded_tile.read_from(img), padded_tile) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Local\chaiNNer\app-0.22.3-nightly2024-04-07\resources\src\nodes\impl\onnx\auto_split.py", line 84, in upscale output: np.ndarray = session.run([output_name], {input_name: lr_img})[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\USER\AppData\Roaming\chaiNNer\python\python\Lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run return self._sess.run(output_names, input_feed, run_options) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running FusedMatMul node. Name:'/layers.0/blocks.2/attn/attns.0/MatMul_FusedMatMulAndScale' Status Message: matmul_helper.h:142 onnxruntime::MatMulComputeHelper::Compute left operand cannot broadcast on dim 1

Andryshik345 commented 6 months ago

Are you sure you have tensorrt set up properly and added to your path env var?

Well, i copied TensorRT's libraries to CUDA's /bin folder, which is in PATH. Though, i cant check any samples like https://github.com/NVIDIA/TensorRT/tree/main/samples/sampleOnnxMNIST (I don't have visual studio installed atm). I also forgot to install tensorrt's python wheels, but not like it changed anything.

>python -m pip show tensorrt
Name: tensorrt
Version: 10.0.0b6
Summary: A high performance deep learning inference library
Home-page: https://developer.nvidia.com/tensorrt
Author: NVIDIA Corporation
Author-email:
License: Proprietary
Location: D:\upscale_software\chainner-nightly\python\python\Lib\site-packages
Requires:
Required-by:

And just to be sure, CUDA works fine for you?

CUDA works.

Screenshot

![изображение](https://github.com/chaiNNer-org/chaiNNer/assets/24612088/2724505b-ef10-4f26-9ac3-7589493bf101)

joeyballentine commented 6 months ago

Tensorrt's python wheels aren't used for onnx's tensorrt support. Anyway, I'll look into this more. Thanks for the updates

zelenooki87 commented 6 months ago

Does the image dimension in the chainner matter when creating a TensorRT model from ONNX?

Specifically, if I have, for example, 500 images where dimensions vary slightly for almost every other image, will the chain create a cached file for each resolution? Because that would take a considerable amount of time. If it doesn't create separate caches, that would be great!

Andryshik345 commented 6 months ago

AFAIK TensorRT should first create an engine file for your model and then it will use it for all input data.

joeyballentine commented 6 months ago

If the image size varies, it might make a new model for each one as it is not set to use dynamic size. I tried to set that up in the past, and was unable to get it to work.

zelenooki87 commented 6 months ago

Yes, I assumed that. It's the same way in Selur's Hybrid program. But for video files, waiting isn't really a problem since they're mostly standard sizes. So, if they stay cached on the hard drive, there's no need to wait again if the input file is the same resolution.

Anyway, we're eagerly waiting for the video super resolution chainner updates in one of the next major builds, and we're keeping our fingers crossed that everything works out as planned. Thanks for everything you've done so far.

chaiNNer-org / chaiNNer

ONNX with TensorRT fails to run #2715

Here is the Tensor Failure