NVIDIA-AI-IOT / cuDLA-samples

YOLOv5 on Orin DLA
Other
180 stars 17 forks source link

Error during Model Conversion Process - Impact Inquiry #3

Open liuweixue001 opened 1 year ago

liuweixue001 commented 1 year ago

Hello,

I hope this message finds you well. I followed the tutorial to successfully convert the model; however, an error occurred during the model conversion process. I am seeking clarification on the potential impact of this error.

The specific error message I encountered is as follows:

[08/23/2023-10:06:30] [V] [TRT] Engine Layer Information: Layer(DLA): {ForeignNode[/model.0/conv/Conv.../model.24/m.2/Conv]}, Tactic: 0x0000000000000003, images (Half[1,3:16,672,672]) -> s8 (Half[1,255:16,84,84]), s16 (Half[1,255:16,42,42]), s32 (Half[1,255:16,21,21]) [08/23/2023-10:06:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +14, GPU +0, now: CPU 14, GPU 0 (MiB) [08/23/2023-10:06:30] [I] Engine built in 13.8529 sec. [08/23/2023-10:06:30] [I] [TRT] Loaded engine size: 14 MiB [08/23/2023-10:06:30] [E] Error[9]: Cannot deserialize serialized engine built with EngineCapability::kDLA_STANDALONE, use cuDLA APIs instead. [08/23/2023-10:06:30] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::65] Error Code 4: Internal Error (Engine deserialization failed.) [08/23/2023-10:06:30] [E] Engine deserialization failed [08/23/2023-10:06:30] [I] Skipped inference phase since --buildOnly is added. &&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/yolov5s_trimmed_reshape_tranpose.onnx --verbose --fp16 --saveEngine=data/loadable/yolov5.fp16.fp16chw16in.fp16chw16out.standalone.bin --inputIOFormats=fp16:chw16 --outputIOFormats=fp16:chw16 --buildDLAStandalone --useDLACore=0

I would appreciate it if you could kindly provide information about the potential consequences of this error. Does it affect the converted model's functionality or performance?

Thank you very much for your assistance.

mrfsc commented 1 year ago

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms:

** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8

DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg **

DC-Zhou commented 1 year ago

I get same error on Jetson Orin AGX, I think maybe the trt version need 8.6.0 but for jetpack, trt only is 8.5.3, so when trtexec deal with the onnx , DLAalone feature is not support. Maybe need repo docker images?

zerollzeng commented 1 year ago

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms:

** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8

DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

mrfsc commented 1 year ago

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms: ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?

zerollzeng commented 1 year ago

Unfortunately no, You have to wait for its release :-(

jinzhongxiao commented 2 months ago

I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms: ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg

Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.

Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?

I don't think Jetpack 6.0+ is workful, I have tried jetpack 6.0, it had some other issue in running bash loadle.sh