Open mayulin0206 opened 7 months ago
For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.
You should do QAT2PTQ, get scales of qat onnx and then save as a calib table to run int8 of dla.
For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.
You should do QAT2PTQ, get scales of qat onnx and then save as a calib table to run int8 of dla.
yes, I did this, but the result is still completely wrong. the inference results are correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8
- Could you please try the latest DriveOS/JP release?
- We have a yolov5 dla sample: https://github.com/NVIDIA-AI-IOT/cuDLA-samples maybe is helpful to you.
- Please provide a minimal reproduce if the latest release still fail.
@zerollzeng According to your advice, I ran the yolov5 dla sample(https://github.com/NVIDIA-AI-IOT/cuDLA-samples) on the Orin DLA, but encountered the following issue as shown in below.
make run
/usr/local/cuda//bin/nvcc -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -gencode arch=compute_87,code=sm_87 -c -o build/decode_nms.o src/decode_nms.cu g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/validate_coco.o src/validate_coco.cpp g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/yolov5.o src/yolov5.cpp g++ -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include --std=c++14 -Wno-deprecated-declarations -Wall -O2 -c -o build/cudla_context_hybrid.o src/cudla_context_hybrid.cpp g++ --std=c++14 -Wno-deprecated-declarations -Wall -O2 -I /usr/local/cuda//include -I ./src/matx_reformat/ -I /usr/include/opencv4/ -I /usr/include/jsoncpp/ -I /usr/include -o ./build/cudla_yolov5_app build/decode_nms.o build/validate_coco.o build/yolov5.o build/cudla_context_hybrid.o -l cudla -L/usr/local/cuda//lib64 -l cuda -l cudart -l nvinfer -L /usr/lib/aarch64-linux-gnu/ -l opencv_objdetect -l opencv_highgui -l opencv_imgproc -l opencv_core -l opencv_imgcodecs -L ./src/matx_reformat/build/ -l matx_reformat -l jsoncpp -lnvscibuf -lnvscisync ././build/cudla_yolov5_app --engine ./data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --image ./data/images/image.jpg --backend cudla_int8 [hybrid mode] create cuDLA device SUCCESS [hybrid mode] load cuDLA module from memory FAILED in src/cudla_context_hybrid.cpp:96, CUDLA ERR: 7 make: *** [Makefile:80: run] Error 1
bash data/model/build_dla_standalone_loadable.sh
[04/22/2024-19:51:27] [E] Error[3]: [builderConfig.cpp::setFlag::65] Error Code 3: API Usage Error (Parameter check failed at: optimizer/api/builderConfig.cpp::setFlag::65, condition: builderFlag != BuilderFlag::kPREFER_PRECISION_CONSTRAINTS || !flags[BuilderFlag::kOBEY_PRECISION_CONSTRAINTS]. kPREFER_PRECISION_CONSTRAINTS cannot be set if kOBEY_PRECISION_CONSTRAINTS is set. ) [04/22/2024-19:51:27] [E] Error[2]: [nvmRegionOptimizer.cpp::forceToUseNvmIO::175] Error Code 2: Internal Error (Assertion std::all_of(a->consumers.begin(), a->consumers.end(), [](Node* n) { return isDLA(n->backend); }) failed. ) [04/22/2024-19:51:27] [E] Error[2]: [builder.cpp::buildSerializedNetwork::636] Error Code 2: Internal Error (Assertion engine != nullptr failed. ) [04/22/2024-19:51:27] [E] Engine could not be created from network [04/22/2024-19:51:27] [E] Building engine failed [04/22/2024-19:51:27] [E] Failed to create engine from model or file. [04/22/2024-19:51:27] [E] Engine set up failed &&&& FAILED TensorRT.trtexec [TensorRT v8401] # /usr/src/tensorrt/bin/trtexec --minShapes=images:1x3x672x672 --maxShapes=images:1x3x672x672 --optShapes=images:1x3x672x672 --shapes=images:1x3x672x672 --onnx=data/model/yolov5_trimmed_qat.onnx --useDLACore=0 --buildDLAStandalone --saveEngine=data/loadable/yolov5.int8.int8hwc4in.fp16chw16out.standalone.bin --inputIOFormats=int8:dla_hwc4 --outputIOFormats=fp16:chw16 --int8 --fp16 --calib=data/model/qat2ptq.cache --precisionConstraints=obey --layerPrecisions=/model.24/m.0/Conv:fp16,/model.24/m.1/Conv:fp16,/model.24/m.2/Conv:fp16
@zerollzeng @lix19937 I also have another questions about DLA. Under the DLA INT8 mode,
@mayulin0206 Did you ever solve this? I'm facing the exact same pattern of issues: things are working on GPU int8 and DLA fp16, but producing nonsense for DLA int8. I'm also on an Orin with Jetpack 5.1, running TensorRT 8.5.2.2 through python. Upgrading Jetpack isn't an option for me, though I could try an updated TensorRT.
Description
For the quantized INT8 model, the inference results are correct under Orin DLA FP16, and the results are also correct under Orin GPU INT8, but the results are completely incorrect under Orin DLA INT8.
Environment
TensorRT Version :8.4.12
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):