Gui-Yom / turbo-metrics

Hardware acceleration for your daily video tasks
GNU Affero General Public License v3.0
8 stars 2 forks source link
amf cuda npp nvdec ssimulacra2 video

TurboMetrics

A collection of video related libraries and tools oriented at performance and hardware acceleration. Including :


Goal

This project started as me noticing my GPU usage at 0% while my CPU was overloaded while doing video processing.

The strategy is to offload as much work as possible onto the GPU :

  1. Demux a video file on the CPU
  2. Decode the bitstream on hardware and keep the frame in CUDA memory
  3. Do any costly processing on the frames (IQA, postprocessing ...) using the GPU
  4. Get the results back to the CPU

In some instances, it would be impossible to decode the frame on the GPU, which means one has to stream decoded frames from the CPU (e.g. image formats), this would reduce performance but still be faster than full CPU processing if the frames can stay in gpu memory long enough.

Subprojects

turbo-metrics

CLI to process a pair of videos or images and compute various metrics and statistics. Available here.

cudarse

Here

codec-bitstream

Transform codec bitstream for feeding into GPU decoders. Also provides parsing for metadata like color information.

nvptx-core

Nightly only helper library to write CUDA kernels in Rust. Acts as some kind of libstd for the nvptx64-nvidia-cuda target. Provides a math extension trait to replace std based on bindings to libdevice.

nvptx-builder

Allows a crate to define a dependency on a nvptx crate and have it built with a single cargo build.

cuda-colorspace

Colorspace conversion CUDA kernels used in other crates.

ssimulacra2-cuda

An attempt at computing the ssimulacra2 metric with GPU acceleration leveraging NPP and custom written kernels written in Rust. Preliminary profiling shows that I'm terrible at writing GPU code that runs fast.

Reference implementation : https://github.com/cloudinary/ssimulacra2

vmaf

Bindings to libvmaf.

Prerequisites

This repository is particularly difficult to set up for a Rust project due to the dependencies on various vendor SDKs. You need to be patient and be able to read error message from builds.

Also, it uses a novel approach enabled by recent rustc developments to colocate CUDA kernels written in Rust within the same cargo workspace. This is very much bleeding edge and the way the crates are linked together prevent publishing to crates.io. The only supported way to build any crate in this repo is by cloning the git repo.

Common

Windows

Linux

Support this project

There are various ways you can support development.

TODO ideas

The base is solid. I plan to implement various tools to help the process of making encodes (except encoding itself) from pre-filtering to validation. In no particular order or priority :

Tools & workflows

Algorithms implementations

Inputs

Outputs

Platform support

Currently, we're locked to Nvidia hardware. However, nothing here explicitly requires CUDA.

About video hardware acceleration

Processing videos efficiently is a 2 parts problem :

Video decoding

So you want a cross-platform way to decode videos on every possible platform ? Sadge. This is a mess, there are nearly as many different api as there are hw vendors, os and gpu apis.

Recap table :

API Windows Linux Nvidia Intel AMD AV1 HEVC AVC MPEG2 VC1
NVDEC
VPL
AMF
DXVA
Vulkan Video
VAAPI 🟦vaon12 🟦Nouveau
VDPAU

There is still the option to decode video on the CPU and stream frames to the GPU for computations. This is still faster than doing all processing on the CPU alone.

Compute

Your GPU will blow your CPU on any image processing task. Processing frames on the GPU is the best thing that can be done for speed.

Recap table :

API Windows Linux Intel AMD Nvidia NVDEC VPL AMF Vulkan Video CPU-side Rust GPU-side Rust
CUDA 🟦ZLUDA ✅llvm ptx
Vulkan ✅Spir-V
OpenCL ✅Spir-V
ROCm/HIP
WGPU ✅Spir-V

From both those tables, it seems Vulkan and Vulkan Video are the way forward but well, it's Vulkan.