NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.72k stars 2.12k forks source link

Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) #2251

Closed 980202006 closed 1 year ago

980202006 commented 2 years ago

Description

I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error

[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
[08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: ()
ERROR: Failed to parse the ONNX file: end2end.onnx
ERROR: Failed to parse the ONNX file.
got 1 errors: 
In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840
Conv_1840:kernel weights has count 1474560 but 737280 was expected
Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)

Environment

TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113

Relevant Files

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing

Steps To Reproduce

zerollzeng commented 2 years ago

Maybe a bug, what does your input dimension looks like? image

zerollzeng commented 2 years ago

error node: image

980202006 commented 2 years ago

Where can I get this visualizer?Input likes [1,6,3,720,1296]

zerollzeng commented 2 years ago

https://netron.app/

980202006 commented 2 years ago

@zerollzeng I observed that there is a parameter max_workspace_size, which may be the largest batch size when exporting the model. What determines max_workspace_size? Will fp16 cause max_workspace_size to become smaller?

980202006 commented 2 years ago

@zerollzeng Is there a way to map the problematic operator in onnx to the torch model code?

zerollzeng commented 2 years ago

@zerollzeng Is there a way to map the problematic operator in onnx to the torch model code?

I tried to find the answer before but failed finally :-( so I don't think its possible, and the exported node name will change across different Pytorch versions AFAIK.

zerollzeng commented 2 years ago

@zerollzeng I observed that there is a parameter max_workspace_size, which may be the largest batch size when exporting the model. What determines max_workspace_size? Will fp16 cause max_workspace_size to become smaller?

Yes, but since 8.4 you don't need to worry about the workspace size. we set it to max by default.

980202006 commented 2 years ago

Thank you, I encountered another problem here, do you have any ideas on this problem?

Process Process-3:
Traceback (most recent call last):
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/home/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
    ret = func(*args, **kwargs)
  File "/home/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 88, in onnx2tensorrt
    device_id=device_id)
  File "/home/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 113, in from_onnx
    raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 4622 (addScatterLayer): UNSUPPORTED_NODE: Assertion failed: indicesDims.d[i] <= dataDims.d[i] && "Indices dimensions must be less than data dimensions!"
980202006 commented 2 years ago

@zerollzeng

zerollzeng commented 2 years ago

the error is raise in here: https://github.com/onnx/onnx-tensorrt/blob/1da7332349d5b1196ccfa6dc719b839876f1e83e/onnx2trt_utils.cpp#L2265 it's happened during parse the onnx, you can check the node 4622 in you onnx model. or share it here so that I can take a look

980202006 commented 2 years ago

https://drive.google.com/file/d/1XJ86EWnUmdHEOMgYCsQHs9ESJmdlgbIW/view?usp=sharing

[08/22/2022-10:44:21] [TRT] [V] Graph construction and optimization completed in 28.9074 seconds.
[08/22/2022-10:44:22] [TRT] [V] Using cublasLt as a tactic source
[08/22/2022-10:44:22] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +485, GPU +206, now: CPU 1411, GPU 514 (MiB)
[08/22/2022-10:44:22] [TRT] [V] Using cuDNN as a tactic source
[08/22/2022-10:44:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +468, GPU +204, now: CPU 1879, GPU 718 (MiB)
[08/22/2022-10:44:23] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.4
[08/22/2022-10:44:23] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/22/2022-10:44:23] [TRT] [V] Constructing optimization profile number 0 [1/1].
[08/22/2022-10:44:23] [TRT] [E] 4: [shapeCompiler.cpp::evaluateShapeChecks::911] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer Reshape_4296: reshaping failed for tensor: onnx::Reshape_5167)
Traceback (most recent call last):
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/root/miniconda3/envs/mmdeploy/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "/home/mmdeploy/to_fp16.py", line 274, in <module>
    build_engine_onnx(onnx_model_file)
  File "/home/mmdeploy/to_fp16.py", line 198, in build_engine_onnx
    with builder.build_engine(network, config) as engine, open(args.engine_file, "wb") as f:
AttributeError: __enter__
(base) root@ecs-0:/home/mmdeploy# conda activate mmdeploy
980202006 commented 2 years ago

@zerollzeng Thanks, here is my onnx file.

zerollzeng commented 2 years ago

Do you use dynamic shape? looks like your model doesn't support dynamic shape or you input dimension is invalid:

[E] 4: [shapeCompiler.cpp::evaluateShapeChecks::911] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer Reshape_4296: reshaping failed for tensor: onnx::Reshape_5167)
zerollzeng commented 2 years ago

I can't reproduce your error on my side because your model contains your own plugin:

[08/22/2022-15:37:14] [I] [TRT] No importer registered for op: grid_sampler. Attempting to import as plugin.
[08/22/2022-15:37:14] [I] [TRT] Searching for plugin: grid_sampler, plugin_version: 1, plugin_namespace:
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:773: While parsing node number 292 [grid_sampler -> "onnx::Concat_561"]:
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:774: --- Begin node ---
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:775: input: "x.19"
input: "grid_flow"
output: "onnx::Concat_561"
name: "grid_sampler_292"
op_type: "grid_sampler"
attribute {
  name: "align_corners"
  i: 1
  type: INT
}
attribute {
  name: "interpolation_mode"
  i: 0
  type: INT
}
attribute {
  name: "padding_mode"
  i: 1
  type: INT
}
domain: "mmdeploy"

[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:776: --- End node ---
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:778: ERROR: parsers/onnx/builtin_op_importers.cpp:4890 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
zerollzeng commented 2 years ago

My command using trtexec:

&&&& FAILED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=end2end_new.onnx --optShapes=input:1x3x720x1296
980202006 commented 2 years ago

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing @zerollzeng The custom operator so files required for my model and the python code used for exporting are here.

ttyio commented 1 year ago

@980202006 Is the error still exist in latest 8.6? thanks!

980202006 commented 1 year ago

I don't remember, I bypassed this problem by rewriting the torch forward inference code

zerollzeng commented 1 year ago

Okay, I'm closing this now. Feel free to reopen it if you have any further questions.

Liupei1101 commented 7 months ago

Description

I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error

[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
[08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: ()
ERROR: Failed to parse the ONNX file: end2end.onnx
ERROR: Failed to parse the ONNX file.
got 1 errors: 
In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840
Conv_1840:kernel weights has count 1474560 but 737280 was expected
Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)

Environment

TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113

Relevant Files

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing

Steps To Reproduce

can you tell me how fix this issue

Liupei1101 commented 6 months ago

Description

I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error

[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
[08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512
[08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: ()
ERROR: Failed to parse the ONNX file: end2end.onnx
ERROR: Failed to parse the ONNX file.
got 1 errors: 
In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840
Conv_1840:kernel weights has count 1474560 but 737280 was expected
Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280
[convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)

Environment

TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113

Relevant Files

https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing

Steps To Reproduce

do you have fix the issue?

Liupei1101 commented 6 months ago

error node: image [08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 33 512 / 2 = 737280 [08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) [08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512 [08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: () ERROR: Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file. got 1 errors: In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840 Conv_1840:kernel weights has count 1474560 but 737280 was expected Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 33 512 / 2 = 737280 [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) I have the same problem?I do not know why?

Liupei1101 commented 6 months ago

Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file.

i have the same problem, how to solve?