Open liuweixue001 opened 1 year ago
I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms:
** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8
DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg **
I get same error on Jetson Orin AGX, I think maybe the trt version need 8.6.0 but for jetpack, trt only is 8.5.3, so when trtexec deal with the onnx , DLAalone feature is not support. Maybe need repo docker images?
I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms:
** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8
DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg
Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.
I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms: ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg
Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.
Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?
Unfortunately no, You have to wait for its release :-(
I encountered the same problem, but the code ran successfully. However, the inference time of the DLA model is much longer than the 4ms in the tutorial, and the actual inference time is about 30ms: ** ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 DLA CTX INIT !!! ALL MEMORY REGISTERED SUCCESSFULLY Run Yolov5 DLA pipeline for ./data/images/image.jpg SUBMIT CUDLA TASK Input Tensor Num: 1 Output Tensor Num: 3 SUBMIT IS DONE !!! Inference time: 30.567 ms Num object detect: 919 detect result has been write to result.jpg
Which DOS/Jetpack you are using? You need DOS 6080+ or JP 6.0+ to get the perf in our readme.
Thanks for reply, My JetPack version is 5.1.2, and it's the latest version I can get in all jetpack archives: (https://developer.nvidia.com/embedded/jetpack-archive) how can I get JetPack 6.0+? or is there a docker image to verify the performance?
I don't think Jetpack 6.0+ is workful, I have tried jetpack 6.0, it had some other issue in running bash loadle.sh
result.jpg has no result boxes!
Hello,
I hope this message finds you well. I followed the tutorial to successfully convert the model; however, an error occurred during the model conversion process. I am seeking clarification on the potential impact of this error.
The specific error message I encountered is as follows:
[08/23/2023-10:06:30] [V] [TRT] Engine Layer Information: Layer(DLA): {ForeignNode[/model.0/conv/Conv.../model.24/m.2/Conv]}, Tactic: 0x0000000000000003, images (Half[1,3:16,672,672]) -> s8 (Half[1,255:16,84,84]), s16 (Half[1,255:16,42,42]), s32 (Half[1,255:16,21,21]) [08/23/2023-10:06:30] [I] [TRT] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +14, GPU +0, now: CPU 14, GPU 0 (MiB) [08/23/2023-10:06:30] [I] Engine built in 13.8529 sec. [08/23/2023-10:06:30] [I] [TRT] Loaded engine size: 14 MiB [08/23/2023-10:06:30] [E] Error[9]: Cannot deserialize serialized engine built with EngineCapability::kDLA_STANDALONE, use cuDLA APIs instead. [08/23/2023-10:06:30] [E] Error[4]: [runtime.cpp::deserializeCudaEngine::65] Error Code 4: Internal Error (Engine deserialization failed.) [08/23/2023-10:06:30] [E] Engine deserialization failed [08/23/2023-10:06:30] [I] Skipped inference phase since --buildOnly is added. &&&& PASSED TensorRT.trtexec [TensorRT v8502] # /usr/src/tensorrt/bin/trtexec --onnx=data/model/yolov5s_trimmed_reshape_tranpose.onnx --verbose --fp16 --saveEngine=data/loadable/yolov5.fp16.fp16chw16in.fp16chw16out.standalone.bin --inputIOFormats=fp16:chw16 --outputIOFormats=fp16:chw16 --buildDLAStandalone --useDLACore=0
I would appreciate it if you could kindly provide information about the potential consequences of this error. Does it affect the converted model's functionality or performance?
Thank you very much for your assistance.