PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Apache License 2.0
12.6k stars 2.86k forks source link

infer.py推理单张图片,用GPU反而比不用更耗时? #9090

Open JianyuZhan opened 1 month ago

JianyuZhan commented 1 month ago

问题确认 Search before asking

Bug组件 Bug Component

No response

Bug描述 Describe the Bug

我在一台v100的机器上,用PaddleDetection/deploy/python/infer.py来推理, 发现用--device GPU和不用,前者的速度居然比后者慢。

测试过程:

输出

grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
-----------  Running Arguments -----------
action_file: None
batch_size: 1
camera_id: -1
collect_trt_shape_info: False
combine_method: nms
cpu_threads: 1
device: GPU
enable_mkldnn: False
enable_mkldnn_bfloat16: False
image_dir: None
image_file: test_data//mask_0e765753e645a104c0bbea1f4e739317.jpeg
match_metric: ios
match_threshold: 0.6
model_dir: /opt/ml/model/layout
output_dir: output
overlap_ratio: [0.25, 0.25]
random_pad: False
reid_batch_size: 50
reid_model_dir: None
run_benchmark: False
run_mode: paddle
save_images: True
save_mot_txt_per_img: False
save_mot_txts: False
save_results: False
scaled: False
slice_infer: False
slice_size: [640, 640]
threshold: 0.5
tracker_config: None
trt_calib_mode: False
trt_max_shape: 1280
trt_min_shape: 1
trt_opt_shape: 640
tuned_trt_shape_file: shape_range_info.pbtxt
use_coco_category: False
use_dark: True
use_fd_format: False
use_gpu: False
video_file: None
window_size: 50
------------------------------------------
-----------  Model Configuration -----------
Model Arch: GFL
Transform Order: 
--transform op: Resize
--transform op: NormalizeImage
--transform op: Permute
--transform op: PadStride
--------------------------------------------

loaded detector cost 2.731602668762207s
class_id:3, confidence:0.6529, left_top:[31.77,336.25],right_bottom:[732.56,1074.07]
class_id:4, confidence:0.6105, left_top:[520.25,751.66],right_bottom:[748.16,896.60]
save result to: output/mask_0e765753e645a104c0bbea1f4e739317.jpeg
Test iter 0
predict  cost 0.9343917369842529s
------------------ Inference Time Info ----------------------
total_time(ms): 916.1, img_num: 1
average latency time(ms): 916.10, QPS: 1.091584
preprocess_time(ms): 56.60, inference_time(ms): 859.50, postprocess_time(ms): 0.00

上面只展示了分别一次调用,但我实际每个都各测试了10次,均是非常稳定的 CPU快于GPU

我还在infer.py里对Detector加载的代码打打印了耗时,对推理代码detector.predict_image()也打印了耗时,如上两条对应日志:

loaded detector cost 2.731602668762207s
predict  cost 0.9343917369842529s

对比上面日志看,加载和推理,用GPU都比CPU慢?? 这是因为单张推理,瓶颈反而在加载模型和数据到GPU上,导致反而更慢?

复现环境 Environment

os: Ubuntu 22.04 PaddleDetection: release/2.7 Paddle pythone libarary: 2.6.0

Bug描述确认 Bug description confirmation

是否愿意提交PR? Are you willing to submit a PR?

cuicheng01 commented 1 month ago

您好,是哪个模型呢?

JianyuZhan commented 1 month ago

您好,是哪个模型呢?

您好,我用的是一个基于这个文档训练并导出的模型, 是基于 picodet_lcnet_x1_0_layout训练的模型。

cuicheng01 commented 1 month ago

建议多循环一些次数测试下呢

TingquanGao commented 3 weeks ago

The issue has no response for a long time and will be closed. You can reopen or new another issue if are still confused.


From Bot

JianyuZhan commented 3 weeks ago

这个至少在我的case里,是能稳定复现的。所以我现在关了GPU推理,明显快很多

cuicheng01 commented 3 weeks ago

其实不建议这么推理,如果真的想快速推理,建议使用TRT之类的加速方案