NVIDIA / DALI

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html
Apache License 2.0
5.06k stars 615 forks source link

How to read a sequence of images with labels, and apply the same augmentation/transformation on the images in the set #3221

Closed Michelvl92 closed 3 years ago

Michelvl92 commented 3 years ago

Hi all, my question is what is the best (and fastest way) to read image sequences like is shown here for the readers.sequence Reading Video Frames Stored as Images, but with labels. for each sequence?

My dataset: is an action classification dataset, that where sequences of images are labeled as an action, e.g. run, throw, dance, etc. The labels for the action can be stored in any way (I am very flexible in that), e.g. based on the subfolder (name), best on .txt file, etc. Eventually, it would be possible to store the image sequences as tfrecord or NumPy array (but I would like to keep as much as possible the option to e.g. skip frames and/or take a shorter sequence).

I researched for some possibilities, but all of them lack the perfect solution:

readers.sequnce

readers.file

readers.tfrecord

readers.numpy

ExternalSource Operator

On top of this, is it possible to apply transformations or augmentation like nvidia.dali.fn.flip(), nvidia.dali.fn.transforms.shear()on the sequence such that every frame in the sequence undergoes the same augmentation/transformation?

JanuszL commented 3 years ago

Hi @Michelvl92,

In your case you can either use multiple readers.file, each of it would read one frame from each sequence. To do that, pass to files argument list: reader0, files=[seq0_frame0, seq1_frame0...], reader1, files=[seq0_frame1, seq1_frame1...] (the same do for labels argument), then use stack/cat operator to create sequences. You can also use the external source operator, you can check parallel and prefetch_queue_depth arguments to speed things up (there are examples that soon should be available as a part of the documentation, you can preview them in https://github.com/NVIDIA/DALI/pull/3199).

On top of this, is it possible to apply transformations or augmentation like nvidia.dali.fn.flip(), nvidia.dali.fn.transforms.shear()on the sequence such that every frame in the sequence undergoes the same augmentation/transformation?

If you have each frame as a separate output you can do something:

flip = fn.random.coin_flip()
fame0 = fn.flip(frame0, vertical=flip)
fame1 = fn.flip(frame1, vertical=flip)

Or

flip = fn.random.coin_flip()
frames = fn.flip([frame0, frame1], vertical=flip) 

Or (as most operators support sequences)

flip = fn.random.coin_flip()
sequence = fn.stack(frame0, frame1)
frames = fn.flip(sequence , vertical=flip)
Michelvl92 commented 3 years ago

Thank you for your comment, this explains a clear solution for both.

Why is there no option in readers.sequnce to include labels, is this not something you always want for training?

For the readers.file solution, this means creating the same no. of readers.file as sequences as I will need, and could be (not sure if you agree) an ugly solution. (how) Will having multiple readers.file have an impact on the performance (speed), but also on memory utilization?

JanuszL commented 3 years ago

Hi @Michelvl92,

The implementation of the readers.sequnce is rudimentary and we don't have plans to develop it further in the near future, however if you want you can try extending it to your needs.

For the readers.file solution, this means creating the same no. of readers.file as sequences as I will need, and could be (not sure if you agree) an ugly solution. (how) Will having multiple readers.file have an impact on the performance (speed), but also on memory utilization?

The reading process should not have much impact on the performance, and the memory consumption should be similar to having a corresponding solution that would have read sequences instead of files. However creating it would require creating multiple decoder instances later on, and if you use a mixed backend it can consume a lot of memory in such case. Still, I don't think there is any better solution available in DALI for now.

Michelvl92 commented 3 years ago

@JanuszL thank you for your answer.