Linzaer / Ultra-Light-Fast-Generic-Face-Detector-1MB

💎1MB lightweight face detection model (1MB轻量级人脸检测模型)
MIT License
7.17k stars 1.54k forks source link

Problem about running onnx model on TensorRT lib #252

Open pango99 opened 3 years ago

pango99 commented 3 years ago

Hi: I try to running the onnx model on NVIDIA TensorRT lib, firstly I load the version-RFB-320.onnx model, trt lib report below warnings,and detect result is wrong

onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32. onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped onnx2trt_utils.cpp:246: One or more weights outside the range of INT32 was clamped

I also test the version-RFB-640.onnx,it has the same problem, so whether the INT64->INT32 conversion is the cause of the mistake? For running on TensorRT, should I use the "simplified" or "without_postprocessing" version model?

k-sokolov commented 3 years ago

hi, this conversion is not the source of the buggy detection. its rather that you need to preprocess inputs in specific way suppose input_data is your image read by cv2.imread. then you need to

input_prep = np.expand_dims(np.transpose(input_data, (2, 0, 1)), axis=0).astype(np.float32) / 255.
input_prep = np.array(input_prep, dtype=input_prep.dtype, order='C')

and feed this to your engine

pango99 commented 3 years ago

hi, this conversion is not the source of the buggy detection. its rather that you need to preprocess inputs in specific way suppose input_data is your image read by cv2.imread. then you need to

input_prep = np.expand_dims(np.transpose(input_data, (2, 0, 1)), axis=0).astype(np.float32) / 255.
input_prep = np.array(input_prep, dtype=input_prep.dtype, order='C')

and feed this to your engine

hi, k-sokolov: thanks your reply,my program is written by C,not python,and I am not skilled in python,so I rewrite my C preprocess code like below:

`cv::Mat inputDetImage = cv::dnn::blobFromImage(*detImage, 1.0 / 255.0, gCnnInputSize, cv::Scalar(0, 0, 0), true);

cudaMemcpy( gEngInputBuff_CUDA, inputDetImage.data, gCnnInputSize.widthgCnnInputSize.height 3 * sizeof(float), cudaMemcpyHostToDevice); ` I think cv::dnn::blobFromImage() can produce the same data like your code,but the detect result is still wrong,so where is my code wrong?