How to use a single pipeline function to decode a video file (ex: .mp4) into video AND audio tensors

NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html

Apache License 2.0

5.06k stars 615 forks source link

How to use a single pipeline function to decode a video file (ex: .mp4) into video AND audio tensors #5597

Open zade-twelvelabs opened 1 month ago

zade-twelvelabs commented 1 month ago

Describe the question.

fn.readers.video must be used to read and decode video files
fn.readers.file must be used to decode audio files, but does not accept video formats

So if I can't uses fn.readers.file to read a videos audio, and fn.readers.video does not decode video audio, how do I decode a .mp4 files audio?

Check for duplicates

[X] I have searched the open bugs/issues and have found no duplicates for this bug report

JanuszL commented 1 month ago

Hi @zade-twelvelabs,

Thank you for reaching out. Currently, DALI doesn't support decoding audio from mp4 files. The current audio decoding capabilities (and the flow) are described here. What you can do is use the external source operator and utilize FFmpeg to load and decode audio from mp4 containers. As audio decoding is not GPU accelerated in DALI, there shouldn't be a substantial perf overhead due to this.