enazoe / yolo-tensorrt

TensorRT8.Support Yolov5n,s,m,l,x .darknet -> tensorrt. Yolov4 Yolov3 use raw darknet *.weights and *.cfg fils. If the wrapper is useful to you,please Star it.
MIT License
1.19k stars 316 forks source link

YOLOV4_TINY 设置的FP32,预测时间20ms,设置INT8也是20ms。 #107

Open xu998 opened 3 years ago

xu998 commented 3 years ago

你好,昨天调通了demo,用的 Win10 , Tensorrt-7.0.0.11,Cuda 10.2,Cudnn 7.6.5.32,GTX 1050ti 4G,yolov4-tiny

设置的FP32

pre elasped time:7.0705ms inference elasped time:6.262ms post elasped time:4.7838ms detect elasped time:21.3155ms

设置的INT8

pre elasped time:10.9021ms inference elasped time:4.9992ms post elasped time:2.9783ms detect elasped time:20.4411ms

    Config config_v4_tiny;
config_v4_tiny.net_type = YOLOV4_TINY;
config_v4_tiny.detect_thresh = 0.5;
config_v4_tiny.file_model_cfg = "d:/yolov4-tiny.cfg";
config_v4_tiny.file_model_weights = "d:/yolov4-tiny.weights";
config_v4_tiny.calibration_image_list_file_txt = "d:/calibration_images.txt";
config_v4_tiny.inference_precison = INT8;  //FP32

我看网上好多文章都是2ms。 请问哪里需要设置吗,不需要设置输入尺寸吗,我的是416*416的, 还有个报警WARNING: Current optimization profile is: 0. Please ensure there are no enqueued operations pending in this context prior to switching profiles 这个需要改吗?

enazoe commented 3 years ago

尺寸是从你cfg里来的,不同实现的耗时可能是不同的,还有就是batch数,有些结果是多batch下的,还有就是平台差异

xu998 commented 3 years ago

改成release变成10ms了,那其他没什么改的了,之前折腾调用原生的yolo_cpp_dll.dll预测时间是20ms,用opencv cuda预测时间是11ms。网上都说tensorrt是最快的,快很多的。。 bitch是这个吗? YoloPluginCtx ctx = new YoloPluginCtx; ctx->initParams = initParams; ctx->batchSize = batchSize; ctx->networkInfo;// = getYoloNetworkInfo(); ctx->inferParams;// = getYoloInferParams(); uint32_t configBatchSize=1;// = getBatchSize();

xu998 commented 3 years ago

你好, 是在 .cfg里改吧,INT8的从batch1改到batch8改到batch32 都是差不多10ms。我再试试升级一下各个版本。