NVIDIA-AI-IOT / cuDLA-samples

YOLOv5 on Orin DLA
Other
167 stars 17 forks source link

cudla import external semaphore FAILED #15

Open WangFengtu1996 opened 6 months ago

WangFengtu1996 commented 6 months ago
2yjia commented 6 months ago

I can run both modes, but the inference time for each image is 20ms, which is different from what the experiment says, please ask what is the time of your hybrid mode @WangFengtu1996

WangFengtu1996 commented 6 months ago

@2yjia I can not understand why I can not run successfully in standalone mode. The inference time is about 17ms ~20ms. when warmup is finished, the inference time is shortened. My platform is nvidia jetson AGX ORIN DK. would you give me some guide that inference in standalone mode? thks.

WangFengtu1996 commented 6 months ago

@2yjia 我参考了这个issue https://github.com/NVIDIA-AI-IOT/cuDLA-samples/issues/7 但是,我这边遇到新的问题

py310) orin@orin-root:~/workspace/cuDLA-samples$ make validate_cudla_int8 USE_DLA_STANDALONE_MODE=1  USE_DETERMINISTIC_SEMAPHORE=1 -j
/usr/local/cuda/bin/nvcc -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -gencode arch=compute_87,code=sm_87 -c -o build/decode_nms.o src/decode_nms.cu
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/validate_coco.o src/validate_coco.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/yolov5.o src/yolov5.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_hybrid.o src/cudla_context_hybrid.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_standalone.o src/cudla_context_standalone.cpp
src/cudla_context_standalone.cpp: In member function ‘void cuDLAContextStandalone::initialize()’:
src/cudla_context_standalone.cpp:324:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  324 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj, m_WaiterID, m_WaiterValue, m_WaitEventContext.nvsci_fence_ptr);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
src/cudla_context_standalone.cpp:326:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  326 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr,&m_WaiterID,&m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp: In member function ‘int cuDLAContextStandalone::submitDLATask(cudaStream_t)’:
src/cudla_context_standalone.cpp:443:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  443 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr ,&m_WaiterID, &m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp:445:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  445 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj,
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
make: *** [Makefile:69: build/cudla_context_standalone.o] Error 1
make: *** Waiting for unfinished jobs....
WangFengtu1996 commented 6 months ago

@2yjia 我这边尝试去根据仓库的readme,然后去finetune 模型,导出新的模型,这个流程你走通了么,我在 qat->ptq 这个遇到点问题,缺了输出的这个尺度信息。

(py310) orin@orin-root:~/workspace/cuDLA-samples$ python export/qdq_translator/qdq_translator.py --input_onnx_models=yolov5_trimmed_qat.onnx --output_dir=data/model/ --infer_concat_scales --infer_mul_scales 
INFO:root:Parsing yolov5_trimmed_qat.onnx...
INFO:root:No tensor scales for /model.24/m.0/Conv's output tensor s8
INFO:root:No tensor scales for /model.24/m.1/Conv's output tensor s16
INFO:root:No tensor scales for /model.24/m.2/Conv's output tensor s32
WangFengtu1996 commented 6 months ago

@2yjia 设备信息, 我们一致么?

(base) orin@orin-root:/usr/lib/aarch64-linux-gnu/tegra$ jetson_release
Software part of jetson-stats 4.2.4 - (c) 2024, Raffaello Bonghi
Model: Jetson AGX Orin Developer Kit - Jetpack 5.1.2 [L4T 35.4.1]
NV Power Mode[0]: MAXN
Serial Number: [XXX Show with: jetson_release -s XXX]
Hardware:
 - P-Number: p3701-0005
 - Module: NVIDIA Jetson AGX Orin (64GB ram)
Platform:
 - Distribution: Ubuntu 20.04 focal
 - Release: 5.10.120-tegra
jtop:
 - Version: 4.2.4
 - Service: Active
Libraries:
 - CUDA: 11.4.315
 - cuDNN: 8.6.0.166
 - TensorRT: 5.1.2
 - VPI: 2.3.9
 - Vulkan: 1.3.204
 - OpenCV: 4.6.0 - with CUDA: YES
2yjia commented 6 months ago

Software part of jetson-stats 4.2.4 - (c) 2024, Raffaello Bonghi Model: Jetson AGX Orin - Jetpack 5.1 [L4T 35.2.1] NV Power Mode[2]: MODE_30W Serial Number: [XXX Show with: jetson_release -s XXX] Hardware:

2yjia commented 6 months ago

@2yjia 我尝试去根据仓库的自述文件,然后去微调模型,导出新的模型,这个流程你走通了么,我在qat->ptq这个遇到点问题,缺了输出的这个图形信息。

(py310) orin@orin-root:~/workspace/cuDLA-samples$ python export/qdq_translator/qdq_translator.py --input_onnx_models=yolov5_trimmed_qat.onnx --output_dir=data/model/ --infer_concat_scales --infer_mul_scales 
INFO:root:Parsing yolov5_trimmed_qat.onnx...
INFO:root:No tensor scales for /model.24/m.0/Conv's output tensor s8
INFO:root:No tensor scales for /model.24/m.1/Conv's output tensor s16
INFO:root:No tensor scales for /model.24/m.2/Conv's output tensor s32

同样的问题,运行程序后生成了noqdq.onnx,我用这个onnx进行推理部署有一定的问题,不知道作者的fp16和int8两个onnx怎么生成的

WangFengtu1996 commented 6 months ago

@2yjia 我参考了这个issue #7 但是,我这边遇到新的问题

py310) orin@orin-root:~/workspace/cuDLA-samples$ make validate_cudla_int8 USE_DLA_STANDALONE_MODE=1  USE_DETERMINISTIC_SEMAPHORE=1 -j
/usr/local/cuda/bin/nvcc -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -gencode arch=compute_87,code=sm_87 -c -o build/decode_nms.o src/decode_nms.cu
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/validate_coco.o src/validate_coco.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/yolov5.o src/yolov5.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_hybrid.o src/cudla_context_hybrid.cpp
g++ -I /usr/local/cuda/include -I ./src/matx_reformat/ -I /usr/local/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -DUSE_DLA_STANDALONE_MODE -DUSE_DETERMINISTIC_SEMAPHORE -O2 -c -o build/cudla_context_standalone.o src/cudla_context_standalone.cpp
src/cudla_context_standalone.cpp: In member function ‘void cuDLAContextStandalone::initialize()’:
src/cudla_context_standalone.cpp:324:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  324 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj, m_WaiterID, m_WaiterValue, m_WaitEventContext.nvsci_fence_ptr);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
src/cudla_context_standalone.cpp:326:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  326 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr,&m_WaiterID,&m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp: In member function ‘int cuDLAContextStandalone::submitDLATask(cudaStream_t)’:
src/cudla_context_standalone.cpp:443:19: error: ‘NvSciSyncFenceExtractFence’ was not declared in this scope; did you mean ‘NvSciSyncIpcExportFence’?
  443 |     m_nvsci_err = NvSciSyncFenceExtractFence(m_WaitEventContext.nvsci_fence_ptr ,&m_WaiterID, &m_WaiterValue);
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncIpcExportFence
src/cudla_context_standalone.cpp:445:19: error: ‘NvSciSyncFenceUpdateFence’ was not declared in this scope; did you mean ‘NvSciSyncObjGenerateFence’?
  445 |     m_nvsci_err = NvSciSyncFenceUpdateFence(m_WaitEventContext.sync_obj,
      |                   ^~~~~~~~~~~~~~~~~~~~~~~~~
      |                   NvSciSyncObjGenerateFence
make: *** [Makefile:69: build/cudla_context_standalone.o] Error 1
make: *** Waiting for unfinished jobs....

@2yjia 关于我这个问题,你能在解压nvsci*出来目录,帮我 grep 下着两个函数,看下结果么? 十分感谢哈

# 进入 nvsci_headers.tbz2 解压目录
grep -nr "NvSciSyncFenceUpdateFence"

grep -nr "NvSciSyncObjGenerateFence"
ou525 commented 6 months ago

image image I encountered the same problem, has it been solved?

mchi-zg commented 2 months ago

Hi All, could you try this on Jetpack 6.0 DP+. Thanks!