NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
4.98k stars 609 forks source link

Loading specific pre-determined frames from video #5540

Open JosselinSomervilleRoberts opened 5 days ago

JosselinSomervilleRoberts commented 5 days ago

Describe the question.

Hi, I just learned about DALI and wanted to ask if it was the correct tool for my use case. I have a dataset of videos and I want to load them in a Dataloader in PyTorch. I work on multiple GPUs.

My pipeline goes like this:

Now what I want is the batch version of this in a distributed manner. So a pipeline that gives me some frames of shape B,T,C,H,W.

I am currently using a custom DataLoader currently and in __get_item__(index: int) -> torch.Tensor I call a load_video(fname: str, rel_indices: np.ndarray) -> torch.Tensor that can be implemented with different engines (Decord, torchvision.io, ...) which are all too slow. If I understand correctly, the setup of DALI is different as it directly processes batches?

Do you think DALI could be useful in my use case and if so how could I implement this? Keep in mind that I am working in a distributed setup with multiple GPUs (and potentially multiple nodes later on) and that the number of frames extracted T is significantly smaller that the number of frames available.

Thanks!

Check for duplicates

JanuszL commented 5 days ago

Hi @JosselinSomervilleRoberts,

Thank you for reaching out. I'm afraid DALI doesn't support the sampling patterns you ask for. What it can do is sample video with constant steps and stride, while in your case, you look for the equal distribution of a fixed number of samples.

JosselinSomervilleRoberts commented 21 hours ago

Ok thank you for letting me know! Please if this update is made in the future, I would love to hear about it!