Closed 980202006 closed 1 year ago
Maybe a bug, what does your input dimension looks like?
error node:
Where can I get this visualizer?Input likes [1,6,3,720,1296]
@zerollzeng I observed that there is a parameter max_workspace_size, which may be the largest batch size when exporting the model. What determines max_workspace_size? Will fp16 cause max_workspace_size to become smaller?
@zerollzeng Is there a way to map the problematic operator in onnx to the torch model code?
@zerollzeng Is there a way to map the problematic operator in onnx to the torch model code?
I tried to find the answer before but failed finally :-( so I don't think its possible, and the exported node name will change across different Pytorch versions AFAIK.
@zerollzeng I observed that there is a parameter max_workspace_size, which may be the largest batch size when exporting the model. What determines max_workspace_size? Will fp16 cause max_workspace_size to become smaller?
Yes, but since 8.4 you don't need to worry about the workspace size. we set it to max by default.
Thank you, I encountered another problem here, do you have any ideas on this problem?
Process Process-3:
Traceback (most recent call last):
File "/root/miniconda3/envs/mmdeploy/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/root/miniconda3/envs/mmdeploy/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/home/mmdeploy/mmdeploy/apis/core/pipeline_manager.py", line 107, in __call__
ret = func(*args, **kwargs)
File "/home/mmdeploy/mmdeploy/backend/tensorrt/onnx2tensorrt.py", line 88, in onnx2tensorrt
device_id=device_id)
File "/home/mmdeploy/mmdeploy/backend/tensorrt/utils.py", line 113, in from_onnx
raise RuntimeError(f'Failed to parse onnx, {error_msgs}')
RuntimeError: Failed to parse onnx, In node 4622 (addScatterLayer): UNSUPPORTED_NODE: Assertion failed: indicesDims.d[i] <= dataDims.d[i] && "Indices dimensions must be less than data dimensions!"
@zerollzeng
the error is raise in here: https://github.com/onnx/onnx-tensorrt/blob/1da7332349d5b1196ccfa6dc719b839876f1e83e/onnx2trt_utils.cpp#L2265 it's happened during parse the onnx, you can check the node 4622 in you onnx model. or share it here so that I can take a look
https://drive.google.com/file/d/1XJ86EWnUmdHEOMgYCsQHs9ESJmdlgbIW/view?usp=sharing
[08/22/2022-10:44:21] [TRT] [V] Graph construction and optimization completed in 28.9074 seconds.
[08/22/2022-10:44:22] [TRT] [V] Using cublasLt as a tactic source
[08/22/2022-10:44:22] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +485, GPU +206, now: CPU 1411, GPU 514 (MiB)
[08/22/2022-10:44:22] [TRT] [V] Using cuDNN as a tactic source
[08/22/2022-10:44:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +468, GPU +204, now: CPU 1879, GPU 718 (MiB)
[08/22/2022-10:44:23] [TRT] [W] TensorRT was linked against cuDNN 8.4.1 but loaded cuDNN 8.2.4
[08/22/2022-10:44:23] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.
[08/22/2022-10:44:23] [TRT] [V] Constructing optimization profile number 0 [1/1].
[08/22/2022-10:44:23] [TRT] [E] 4: [shapeCompiler.cpp::evaluateShapeChecks::911] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer Reshape_4296: reshaping failed for tensor: onnx::Reshape_5167)
Traceback (most recent call last):
File "/root/miniconda3/envs/mmdeploy/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/root/miniconda3/envs/mmdeploy/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 322, in run_path
pkg_name=pkg_name, script_name=fname)
File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 136, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/root/.vscode-server/extensions/ms-python.python-2022.12.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "/home/mmdeploy/to_fp16.py", line 274, in <module>
build_engine_onnx(onnx_model_file)
File "/home/mmdeploy/to_fp16.py", line 198, in build_engine_onnx
with builder.build_engine(network, config) as engine, open(args.engine_file, "wb") as f:
AttributeError: __enter__
(base) root@ecs-0:/home/mmdeploy# conda activate mmdeploy
@zerollzeng Thanks, here is my onnx file.
Do you use dynamic shape? looks like your model doesn't support dynamic shape or you input dimension is invalid:
[E] 4: [shapeCompiler.cpp::evaluateShapeChecks::911] Error Code 4: Internal Error (kOPT values for profile 0 violate shape constraints: reshape would change volume. IShuffleLayer Reshape_4296: reshaping failed for tensor: onnx::Reshape_5167)
I can't reproduce your error on my side because your model contains your own plugin:
[08/22/2022-15:37:14] [I] [TRT] No importer registered for op: grid_sampler. Attempting to import as plugin.
[08/22/2022-15:37:14] [I] [TRT] Searching for plugin: grid_sampler, plugin_version: 1, plugin_namespace:
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:773: While parsing node number 292 [grid_sampler -> "onnx::Concat_561"]:
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:774: --- Begin node ---
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:775: input: "x.19"
input: "grid_flow"
output: "onnx::Concat_561"
name: "grid_sampler_292"
op_type: "grid_sampler"
attribute {
name: "align_corners"
i: 1
type: INT
}
attribute {
name: "interpolation_mode"
i: 0
type: INT
}
attribute {
name: "padding_mode"
i: 1
type: INT
}
domain: "mmdeploy"
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:776: --- End node ---
[08/22/2022-15:37:14] [E] [TRT] parsers/onnx/ModelImporter.cpp:778: ERROR: parsers/onnx/builtin_op_importers.cpp:4890 In function importFallbackPluginImporter:
[8] Assertion failed: creator && "Plugin not found, are the plugin name, version, and namespace correct?"
My command using trtexec:
&&&& FAILED TensorRT.trtexec [TensorRT v8401] # trtexec --onnx=end2end_new.onnx --optShapes=input:1x3x720x1296
https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing @zerollzeng The custom operator so files required for my model and the python code used for exporting are here.
@980202006 Is the error still exist in latest 8.6? thanks!
I don't remember, I bypassed this problem by rewriting the torch forward inference code
Okay, I'm closing this now. Feel free to reopen it if you have any further questions.
Description
I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error
[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280 [08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) [08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512 [08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: () ERROR: Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file. got 1 errors: In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840 Conv_1840:kernel weights has count 1474560 but 737280 was expected Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280 [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
Environment
TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113
Relevant Files
https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing
Steps To Reproduce
can you tell me how fix this issue
Description
I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error
[08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280 [08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) [08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512 [08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: () ERROR: Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file. got 1 errors: In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840 Conv_1840:kernel weights has count 1474560 but 737280 was expected Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 * 3*3 * 512 / 2 = 737280 [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions)
Environment
TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113
Relevant Files
https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing
Steps To Reproduce
do you have fix the issue?
error node: [08/17/2022-17:06:41] [TRT] [E] Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 33 512 / 2 = 737280 [08/17/2022-17:06:42] [TRT] [E] [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) [08/17/2022-17:06:42] [TRT] [V] Using kernel: (3, 3), strides: (1, 1), prepadding: (1, 1), postpadding: (1, 1), dilations: (1, 1), numOutputs: 512 [08/17/2022-17:06:42] [TRT] [V] Convolution output dimensions: () ERROR: Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file. got 1 errors: In node 1840 (parseGraph): INVALID_NODE: Invalid Node - Conv_1840 Conv_1840:kernel weights has count 1474560 but 737280 was expected Conv_1840: count of 1474560 weights in kernel, but kernel dimensions (3,3) with 320 input channels, 512 output channels and 2 groups were specified. Expected Weights count is 320 33 512 / 2 = 737280 [convolutionNode.cpp::computeOutputExtents::43] Error Code 4: Internal Error (Conv_1840: number of kernel weights does not match tensor dimensions) I have the same problem?I do not know why?
Failed to parse the ONNX file: end2end.onnx ERROR: Failed to parse the ONNX file.
i have the same problem, how to solve?
Description
I found out that the required weight count is twice as in the onnx model, but it's not clear how to fix this error
Environment
TensorRT Version: NVIDIA GPU: cu113 CUDA Version: 8.2 Operating System: ubuntu20.04 Python Version (if applicable): 3.7.13 PyTorch Version (if applicable): 1.12+cu113
Relevant Files
https://drive.google.com/drive/folders/1X2_wBV4DOykZ5eQovbYGARDUsg0ZPxwX?usp=sharing
Steps To Reproduce