NVIDIA / TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
https://developer.nvidia.com/tensorrt
Apache License 2.0
10.76k stars 2.13k forks source link

TensorRT 8.5.2.2 GPU AGX Xavier Jetson 5.1 - Error Code 10: Internal Error (Could not find any implementation for node /model.0/conv/Conv.) #3545

Closed HichTala closed 6 months ago

HichTala commented 11 months ago

Description

Hi everyone,

I am currently working on converting a YOLOv5 ONNX model to TensorRT on my AGX Xavier running Jetson 5.1. Unfortunately, I’ve encountered an issue that I’m struggling to resolve,

Here’s the error message I’m encountering:

[12/12/2023-09:55:41] [TRT] [E] 10: [optimizer.cpp::computeCosts::3728] Error Code 10: Internal Error (Could not find any implementation for node /model.0/conv/Conv.)

Environment

I am using TensorRT version 8.5.2.2 with CUDA 11.4. If necessary, I can share the ONNX model.

Any insights or guidance on how to resolve this issue would be greatly appreciated. Thank you in advance for your time and assistance.

RajUpadhyay commented 10 months ago

Hi, If you are the same guy as in this post, follow what the method I suggested over there. If you are not, even then follow the method suggested on the forum.

https://forums.developer.nvidia.com/t/tensorrt-trying-to-convert-an-onnx-model-to-tensorrt/275818

zerollzeng commented 10 months ago
  1. Does other model work?
  2. Can you try increase workspace size?
  3. If still cannot fix it, please share the onnx here.

Thanks!

HichTala commented 10 months ago

Hi, sorry for the late reply,

@RajUpadhyay I am effectively the same guy as in the other post. The method you shared is the same I used in the first place except the fact that I am using an older script because my yolov5 has been trained on an older version (here is the old version of yolo that I am using). It is one of my collegue that trained the model before I came and he is not there anymore and I don't have the data for the moment, the only thing I have is the trained model attached to this message also. Update: I tried with a newer version of yolov5 with one of their pre-trained models and I still get the error, maybe it's related to the GPU card which is an AGX Xavier and not a "traditional one"?

@zerollzeng Other models give the same error, haven't tried models without convolution tho. I tried increasing the workspace but it still gives the same error message...

I didn't precised but I am trying to calibrate my model using this script here is the command I run

python ./trt_quant/convert_tqt_quant.py --img-dir val2017/ --img-size 512 --batch-size 1 --batch 50 --onnx-model last.onnx

Thank you for your help I really appreciate

last.zip

zerollzeng commented 10 months ago

I feel like it maybe a env issue, could you please try to flash the latest JP 6.0 and try again? We won't fix bug on TRT 8.5 now.

zerollzeng commented 10 months ago

I didn't reproduce the issue with polygraphy?

[I] Finished engine building in 434.928 seconds
nvidia@tegra-ubuntu:~/scratch.zeroz_sw/github_bug/3545$ polygraphy convert last.onnx --int8 -o out.plan
RajUpadhyay commented 10 months ago

Hi, sorry for the late reply,

@RajUpadhyay I am effectively the same guy as in the other post. The method you shared is the same I used in the first place except the fact that I am using an older script because my yolov5 has been trained on an older version (here is the old version of yolo that I am using). It is one of my collegue that trained the model before I came and he is not there anymore and I don't have the data for the moment, the only thing I have is the trained model attached to this message also. Update: I tried with a newer version of yolov5 with one of their pre-trained models and I still get the error, maybe it's related to the GPU card which is an AGX Xavier and not a "traditional one"?

@zerollzeng Other models give the same error, haven't tried models without convolution tho. I tried increasing the workspace but it still gives the same error message...

I didn't precised but I am trying to calibrate my model using this script here is the command I run

python ./trt_quant/convert_tqt_quant.py --img-dir val2017/ --img-size 512 --batch-size 1 --batch 50 --onnx-model last.onnx

Thank you for your help I really appreciate

last.zip

I tried to convert your onnx file. I was able to generate the trt engine without any failure. Although I did do it on my x86 pc ubuntu 22.04 on deepstream 6.4. I generated it using trtexec tool.

Unfortunately I won't be able to give it a try on my jetson which has Jetpack 5.1.2 (trt version 8.5.2), since holidays, sorry. So I am unsure if TensorRT 8.6 is what solves this error but since you have agx xavier, I do not think you can upgrade to JP 6.0 anyway.

Although can you try one thing? Why don't you use docker image on your jetson to check if it is infact an env issue? You can go to jetson-containers git repo by dusty_nv and run a docker image for deepstream sdk. Then run this command: ./trtexec --onnx=last.onnx --saveEngine=engine_fp16.engine --fp16 --useCudaGraph --verbose

Here is the link: https://github.com/dusty-nv/jetson-containers

ttyio commented 6 months ago

closing since no activity for more than 3 weeks per our policy, thanks all!