NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.17k stars 622 forks source link

Questions regarding design choices #5626

Open treasan opened 2 months ago

treasan commented 2 months ago

Describe the question.

Hello everyone,

I have a question regarding some design choices when building a video dataset with DALI. My pipeline consists of several steps where some steps happen within DALI pipelines and some steps are normal python code. Specifically, I have a web dataset consisting of video containing tar files, so my first step is to invoke DALI's webdataset reader within a pipeline. Afterwards, I would like to filter out unwanted video files before decoding based on their metadata. Afterwards I invoke a second DALI pipeline for decoding the video files. Then, I process the decoded videos (e.g. cutting them up into smaller snippets and finally forward those to another DALI processing pipeline (e.g., for resizing etc). A dummy code looks something like this:

@pipeline_def()
def wds_extraction(paths):
    raw_video_bytes = fn.readers.webdataset(paths=paths, ...)
    return raw_video_bytes

def filter(source):
    for video_bytes in source:
        duration, fps = get_metadata(video_bytes)
        ...
        yield video_bytes, duration, fps

@pipeline_def()
def decoding(source, device):
    inputs = fn.external_source(source, num_outputs=3) # bytes, duration, fps
    video = fn.experimental.decoders.video(inputs [0], device=device)
    return video, *inputs[1:] # simply forward duration and fps unchanged ...

def cutting_snippets(source):
    ...

@pipeline_def()
def resizing(source):
    fn.external_source(source, ...)
    ...

def iterator(paths):
    source = wds_extraction_iter(paths) # wraps the wds_extraction pipeline in a DALIRaggedIterator
    source = filter(source)
    source = decoding_iter(source) # wraps the decoding pipeline in a DALIRaggedIterator
    source = cutting_snippets(source)
    source = resizing_iter(source) # wraps the resizing pipeline in a DALIRaggedIterator
    yield from source

I wanted to ask whether this design choice is efficient even with the context switches between pure python and DALI pipelines. Are there some disadvantages performance-wise? Another quite bothering thing is that I have to forward each piece of data through every DALI pipeline even though they do not get updated anymore. For example, I extract the duration and fps of each video in the filter method and want to forward them until the end to the user. Hence, I must also load them into the DALI pipelines and simply output them again.

Is there a better way to achieve a pipeline like this?

Check for duplicates

mdabek-nvidia commented 2 months ago

Hi @treasan,

Thank you for reaching out. Your design is overall not that bad. The improvement I can think of is using the parallel external sources to asynchronously load and filter videos before the decoding. Regarding the metadata you are passing through pipelines, my impression is that they are not that heavy and the overhead will be small.