DeepLabCut / DeepLabCut

Official implementation of DeepLabCut: Markerless pose estimation of user-defined features with deep learning for all animals incl. humans
http://deeplabcut.org
GNU Lesser General Public License v3.0
4.66k stars 1.66k forks source link

Suggestion: Use `ffmpegcv` to replace some functions in `utils /auxfun_videos.py` #2686

Open chenxinfeng4 opened 3 months ago

chenxinfeng4 commented 3 months ago

I'm Xinfeng Chen, from Ying Li lab, Beijing China. Neural science group.

I created the ffmpegcv, which is an alternative to OPENCV for video reading and writing. I used ffmpegcv to speed up my 3D skeleton reconstruion from multiview videos. I noticed that you also use ffmpeg as the backbone to read and write video. You may be interested in my project.

https://github.com/chenxinfeng4/ffmpegcv

Basic usage.

# ffmpegcv
import ffmpegcv
cap = ffmpegcv.VideoCapture(file)
while True:
    ret, frame = cap.read()
    if not ret:
        break
    pass
cap.release()

out = ffmpegcv.VideoWriter('outpy.mp4', 'x264', fps=10)
out.write(frame1)
out.write(frame2)
out.release()

Plus, it supports NVIDIA GPU to decode and encode video, and supports CUDA to resize the video while live reading. Fast and CPU friendly.

"""
          ——————————  NVIDIA GPU accelerating ⤴⤴ ———————
          |                                              |
          V                                              V
video -> decode -> crop -> resize -> RGB -> CUDA:CHW float32 -> model
"""
cap = ffmpegcv.toCUDA(
    ffmpegcv.VideoCaptureNV(file, pix_fmt='nv12', resize=(W,H)),
    tensor_format='chw')

for frame_CHW_cuda in cap:
    frame_CHW_cuda = (frame_CHW_cuda - mean) / std
    result = model(frame_CHW_cuda)

# read to the cuda device
ret, frame_CHW_pycuda = cap.read()     #380fps, 200% CPU load, dtype is [pycuda array]
ret, frame_CHW_pycudamem = cap.read_cudamem()  #dtype is  [pycuda mem_alloc]
ret, frame_CHW_CUDA = cap.read_torch()  #dtype is  [pytorch tensor]
jeylau commented 3 months ago

Hi @chenxinfeng4, that's a cool project! Would you have some numbers on speedup vs opencv for sequential read and accurate random seek?

chenxinfeng4 commented 3 months ago

The ffmpegcv support sequential read, but not random seek. As for the random seek, I would prefer use mmcv.VideoReader, a wrapper function by open-mmlab teams. Plus, the random seek is not reliable especially the hevc file.

As for the speed test, it depent on the usage of ffmpegcv. At the worst case, the ffmpegcv is a bitter slow than opencv. But ffmpegcv supports GPU coding, resize&crop in GPU, RBG24 pixel format directly, CHW torch array directly. In any one of those cases, it would be most likely >2x faster. How much time faster depends on how you recruit the preprocessing stages. image

Not only faster, but also less code and less CPU usage. Version CPU only. image

Version GPU enforced (>2x faster). image

chenxinfeng4 commented 3 months ago

The ffmpegcv also supports high level multiprocessing to speedup video read/writing.

chenxinfeng4 commented 3 months ago

Here is a PR that I use ffmpegcv in open-mmlab/mmdetection. https://github.com/open-mmlab/mmdetection/pull/7832

chenxinfeng4 commented 3 months ago

The deeplabcut/dlclivegui is cool to use queue and share-memory to stage recent live camera frame!!!! Fantastic. And that's what ffmpegcv has done the same way. The ffmpegcv has some out-of-box decorator functions.

As for USB Camera or RTSP Camera (IP camera), the ffmpegcv can buffer recent frames (default) or only read the last frame in strict realtime applications (ReadLiveLast).