ApolloAuto / apollo

An open autonomous driving platform
Apache License 2.0
25.2k stars 9.71k forks source link

Camera Perception with SMOKE detector with high CPU usage even on GPU mode. #15343

Closed rodrigoqueiroz closed 6 months ago

rodrigoqueiroz commented 7 months ago

Describe the bug

Running Apollo V8.0 with Perception in GPU mode, and built in gpu_opt mode. When running camera perception with the SMOKE detector in the pipeline, the CPU usage becomes very high and we can notice a slowdown in the performance of the entire stack. The resulting obstacles published at the end of the Multi Sensor Fusion job will show a visible lag.

Testing with an 13th Gen Intel Corei9-13900K × 32 CPU and a N-Vidia 2080Ti GPU. Starting the modules one by one, Camera perception seems to be the only component raising all CPU cores to the maximum.

Pipeline:

stage_type: OMT_OBSTACLE_TRACKER
stage_type: SMOKE_OBSTACLE_DETECTION
stage_type: OMT_OBSTACLE_TRACKER
stage_type: MULTI_CUE_OBSTACLE_TRANSFORMER
stage_type: LOCATION_REFINER_OBSTACLE_POSTPROCESSOR
stage_type: OMT_OBSTACLE_TRACKER
stage_type: OMT_OBSTACLE_TRACKER
(...)

stage_config: {
  stage_type: SMOKE_OBSTACLE_DETECTION
  enabled: true
  type: "camera_detector"

  camera_detector_config {
    gpu_id: 0
    camera_name: "front_6mm"
    root_dir: "/apollo/modules/perception/production/data/perception/camera/models/yolo_obstacle_detector"
    conf_file: "smoke-config.pt"
  }
}

In the smoke_obstacle_detector.cc, it seems the detection is not running on Cuda code:

https://github.com/ApolloAuto/apollo/blob/3ecbf3006d9a1c3d832c3d925dc838d2d762c571/modules/perception/camera/lib/obstacle/detector/smoke/smoke_obstacle_detector.cc#L346-L360

The inference times also seems to be too high, taking more than 200ms per frame. Is this comment applicable to only the pre process or the entire detection with Smoke runs in the CPU?

Pipeline 2 The same does not happen if Yolo detector is used in the pipeline instead of Smoke.

stage_type: OMT_OBSTACLE_TRACKER
stage_type: YOLO_OBSTACLE_DETECTOR
stage_type: OMT_OBSTACLE_TRACKER
stage_type: MULTI_CUE_OBSTACLE_TRANSFORMER
stage_type: LOCATION_REFINER_OBSTACLE_POSTPROCESSOR
stage_type: OMT_OBSTACLE_TRACKER
stage_type: OMT_OBSTACLE_TRACKER
(...)
stage_config: {
  stage_type: YOLO_OBSTACLE_DETECTOR
  enabled: true
  type: "camera_detector"

  camera_detector_config {
    gpu_id: 0
    camera_name: "front_6mm,front_12mm"
    root_dir: "/apollo/modules/perception/production/data/perception/camera/models/yolo_obstacle_detector"
    conf_file: "config.pt"
  }
}

Inference time seems reasonable, and CPU usage does not increase substantially.

To Reproduce Steps to reproduce the behavior:

  1. Change Camera Perception Pipeline to Smoke
  2. mainboard -d modules/perception/production/dag/dag_streaming_obstacle_detection.dag

Expected behavior CPU usage similar to other camera-based components such as Yolo Detection, or Camera Lane Detection.

Desktop (please complete the following information):

*Summary

Bad performance and CPU usage unreasonably high in the Smoke Detection component. Is this pipeline using CPU only? If not, what could be causing this massive difference in performance and CPU usage compared to Yolo Detector?

LordonCN commented 7 months ago

Hi rodrigoqueiroz, Thanks for your attention, SMOKE postprocess was achieved in CPU and use libtorch to inference, it will not be optimized in GPU and trt in the future. We provide two stage yolo with training code for monodetection in 9.0, you can try it.

rodrigoqueiroz commented 6 months ago

Thank you. I switched the pipeline to use YOLO Detection from v8 as I have not upgraded to v9 yet.