video_reader-rs
A python module to decode videos based on rust ffmpeg-next, with a focus on ML use cases.
When training ML models on videos, it is usefull to load small sub-clips of videos. So decoding the entire video is not necessary.
The great decord library seems to be unmaintained, while having a few issues. The main one (for us) is bad memory management, which makes it crash on large videos. Indeed it allocates memory for the whole video when instantiating a VideoReader object. While in fact you might want to only get a few frames from this video.
So we took great inspiration from this library to rewrite the get_batch
function using ffmpeg-next
rust bindings. We also added the decode
function which is usefull for decoding the entire video or
for temporally reducing it using a compression_factor
. Option to resize the video while decoding is also
added.
NOTE: other functionalities of decord
are not implemented (yet?).
Benchmark indicates that video_reader-rs
is performing equally or better than decord
, while using less memory.
At least on the intended ML uses cases where video resolution remains reasonable, eg not 4K videos.
pip install video-reader-rs
Should work with python >= 3.8 on recent linux x86_64, macos and windows.
You need to have ffmpeg installed on your system. Install maturin:
pip install maturin
Activate a virtual-env where you want to use the video_reader library and build the library as follows:
maturin develop --release
maturin develop
builds the crate and installs it as a python module directly in the current virtualenv.
the --release
flag ensures the Rust part of the code is compiled in release mode, which enables compiler optimizations.
:warning: If you are using a version of ffmpeg >= 6.0 you need to enable the ffmpeg_6_0
feature:
maturin develop --release --features ffmpeg_6_0
Decoding a video is as simple as:
import video_reader as vr
frames = vr.decode(filename, resize, compression_factor, threads, start_frame, end_frame)
Returns a numpy array of shape (N, H, W, C).
We can do the same thing if we want grayscale frames, and it will retun an array of shape (N, H, W).
frames = vr.decode_gray(filename, resize, compression_factor, threads, start_frame, end_frame)
If we only need a sub-clip of the video we can use the get_batch
function:
frames = vr.get_batch(filename, indices, threads=0, resize_shorter_side=None, with_fallback=False)
get_batch
does not support multithreading. (NOTE: it is still as fast as decord from our benchmarking)We can also get the shape of the raw video
(n, h, w) = vr.get_shape(filename)
Or get a dict with information about the video, returned as Dict[str, str]
info_dict = vr.get_info(filename)
print(info_dict["fps"])
We can encode the video with h264 codec
vr.save_video(frames, "video.mp4", fps=15, codec="h264")
Decoding a video with shape (2004, 1472, 1472, 3). Tested on a laptop (12 cores Intel i7-9750H CPU @ 2.60GHz), 15Gb of RAM with Ubuntu 22.04.
Options:
Options | OpenCV | decord* | video_reader |
---|---|---|---|
f 0.5 | 33.96s | 14.6s | 26.76s |
f 0.25 | 7.16s | 14.03s | 6.73s |
f 0.25, r 512 | 6.49s | 13.33s | 3.92s |
f 0.25, g | 20.24s | 25.67s | 14.11s |
* decord was tested on a machine with more RAM and CPU cores because it was crashing on the laptop with only 15Gb. See below.
Tested on a laptop with 15Gb of RAM, with ubuntu 22.04 and python 3.10. Run this script:
import video_reader as vr
from time import time
def bench_video_decode(filename, compress_factor, resize):
start = time()
vid = vr.decode(filename, resize_shorter_side=resize, compression_factor=compress_factor, threads=0)
duration = time() - start
print(f"Duration {duration:.2f}sec")
return vid
vid = bench_video_decode("sample.mp4", 0.25)
print("video shape:", vid.shape)
# Terminal output:
# Duration 4.81sec
# video shape: (501, 1472, 1472, 3)
And then run this script:
from decord import VideoReader
vr = VideoReader("sample.mp4")
# Terminal output:
# terminate called after throwing an instance of 'std::bad_alloc'
# what(): std::bad_alloc
# [1] 9636 IOT instruction (core dumped)
get_batch
efficiently.