NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.09k stars 615 forks source link

Will ' fn.readers.video' support reading visual content and audio content at the same time? #4810

Open auzxb opened 1 year ago

auzxb commented 1 year ago

Is this a new feature, an improvement, or a change to existing functionality?

New Feature

How would you describe the priority of this feature request

Must have (e.g. DALI adoption is impossible due to lack in functionality).

Please provide a clear description of problem this feature solves

As a researcher in audio-visual cross-modal learning, I hope to support loading audio and video frames at the same time.

Feature Description

As a researcher in audio-visual cross-modal learning, I hope to support loading audio and video frames at the same time.

Describe your ideal solution

def nvidia.dali.fn.decoders.video(): pass return audio, images, label

Describe any alternatives you have considered

No response

Additional context

No response

Check for duplicates

JanuszL commented 1 year ago

Hi @auzxb,

Thank you for reaching out. Can you tell us more about the models and dataset you are using? Do you need additional metadata data to be processed together with audio and video? What is your current approach? Do you use any particular library/libraries in your workflow?