How to accelerate demo_inference.py?

beegerous commented 1 year ago

I'm using demo_inference.py to mark a video

python3 $AP/scripts/demo_inference.py \
    --video ./long720.mp4 \
    --detector yolox-m \
    --cfg $AP/configs/coco/resnet/256x192_res50_lr1e-3_1x.yaml \
    --detbatch 64 \
    --posebatch 64 \
    --checkpoint $AP/pretrained_models/fast_res50_256x192.pth

The video is 60s long, and demo_inference cost 70s on one RTX 3090.

Loading YOLOX-M model..
100%|███████████████████████████████████████████████████████████| 1433/1433 [00:57<00:00, 24.90it/s]
===========================> Finish Model Running.
Results have been written to json.ring remaining 49 images in the queue....
/code/run/alphapose

real    1m10.033s
user    5m53.485s
sys     0m23.900s

Seems like there are 10s in loading yolo, 50s in running model and 10s in rendering.

But according to MODEL_ZOO.md, FastPose can run 3.54 iter/s with batch 64, so the time cost of running model on TITAN XP should be 6.3s, and less than 3s on RTX 3090. Apparently my test is much more slower than that.

Is there any configure I can change to make it faster? or the calculation is wrong?

beegerous commented 1 year ago

The video has a dozen of people. I try another 60s video with only two people, and it cost 10+36+8=54s, still slower than expected.

Fang-Haoshu commented 1 year ago

Hi, the 3.54 iter/s speed is obtained on the offline single person dataset like COCO or MPII, which is to benchmark the pose estimation model itself. The overall speed of AlphaPose includes the video processing, detection, multiprocessing and so on. Besides, you don't need to set --detbatch to 64, just 4 or 6 is fine.

The running time of the loading model cannot be avoided. Turning off the rendering can speed up the process.

beegerous commented 1 year ago

Hi, the 3.54 iter/s speed is obtained on the offline single person dataset like COCO or MPII, which is to benchmark the pose estimation model itself. The overall speed of AlphaPose includes the video processing, detection, multiprocessing and so on. Besides, you don't need to set --detbatch to 64, just 4 or 6 is fine.

The running time of the loading model cannot be avoided. Turning off the rendering can speed up the process.

感谢回答。我是想接个摄像头实时标记的，试了几个开源库感觉只有alphapose的效果最好。不过如果我测的这个速度基本没问题的话，那岂不至少需要3080Ti才能做到实时演示，还是说咱这个对webcam还有些其他优化呢？

MVIG-SJTU / AlphaPose

How to accelerate demo_inference.py? #1097