How to read a sequence of images with labels, and apply the same augmentation/transformation on the images in the set

Michelvl92 commented 3 years ago

Hi all, my question is what is the best (and fastest way) to read image sequences like is shown here for the readers.sequence Reading Video Frames Stored as Images, but with labels. for each sequence?

My dataset: is an action classification dataset, that where sequences of images are labeled as an action, e.g. run, throw, dance, etc. The labels for the action can be stored in any way (I am very flexible in that), e.g. based on the subfolder (name), best on .txt file, etc. Eventually, it would be possible to store the image sequences as tfrecord or NumPy array (but I would like to keep as much as possible the option to e.g. skip frames and/or take a shorter sequence).

I researched for some possibilities, but all of them lack the perfect solution:

readers.sequnce

This is ultimately what I am looking for, but it misses outputting labels? Is there an option or workaround?

readers.file

This looks also like something that can fit my problem, by saving my image sequence as .npy array, and saving them depending on the class in the folder, but what I understood is this only for images? Or can it be any file? and thus also numpy arrays?

readers.tfrecord

I can store each, of all the sequences as a tfrecord file, with the labels
But this will take time to transform, and not sure if I want to take a smaller sequence, or use a step (skip frames in the sequence)

readers.numpy

Same holds as for above tfrecord

ExternalSource Operator

as shown here: ExternalSource Operator, but what I found in the comments is this doesn't speedup in comparison with the default (tf.data) solution?

On top of this, is it possible to apply transformations or augmentation like nvidia.dali.fn.flip(), nvidia.dali.fn.transforms.shear()on the sequence such that every frame in the sequence undergoes the same augmentation/transformation?

JanuszL commented 3 years ago

Hi @Michelvl92,

In your case you can either use multiple readers.file, each of it would read one frame from each sequence. To do that, pass to files argument list: reader0, files=[seq0_frame0, seq1_frame0...], reader1, files=[seq0_frame1, seq1_frame1...] (the same do for labels argument), then use stack/cat operator to create sequences. You can also use the external source operator, you can check parallel and prefetch_queue_depth arguments to speed things up (there are examples that soon should be available as a part of the documentation, you can preview them in https://github.com/NVIDIA/DALI/pull/3199).

On top of this, is it possible to apply transformations or augmentation like nvidia.dali.fn.flip(), nvidia.dali.fn.transforms.shear()on the sequence such that every frame in the sequence undergoes the same augmentation/transformation?

If you have each frame as a separate output you can do something:

flip = fn.random.coin_flip()
fame0 = fn.flip(frame0, vertical=flip)
fame1 = fn.flip(frame1, vertical=flip)

Or

flip = fn.random.coin_flip()
frames = fn.flip([frame0, frame1], vertical=flip)

Or (as most operators support sequences)

flip = fn.random.coin_flip()
sequence = fn.stack(frame0, frame1)
frames = fn.flip(sequence , vertical=flip)

Michelvl92 commented 3 years ago

Thank you for your comment, this explains a clear solution for both.

Why is there no option in readers.sequnce to include labels, is this not something you always want for training?

For the readers.file solution, this means creating the same no. of readers.file as sequences as I will need, and could be (not sure if you agree) an ugly solution. (how) Will having multiple readers.file have an impact on the performance (speed), but also on memory utilization?

JanuszL commented 3 years ago

Hi @Michelvl92,

The implementation of the readers.sequnce is rudimentary and we don't have plans to develop it further in the near future, however if you want you can try extending it to your needs.

For the readers.file solution, this means creating the same no. of readers.file as sequences as I will need, and could be (not sure if you agree) an ugly solution. (how) Will having multiple readers.file have an impact on the performance (speed), but also on memory utilization?

The reading process should not have much impact on the performance, and the memory consumption should be similar to having a corresponding solution that would have read sequences instead of files. However creating it would require creating multiple decoder instances later on, and if you use a mixed backend it can consume a lot of memory in such case. Still, I don't think there is any better solution available in DALI for now.

Michelvl92 commented 3 years ago

@JanuszL thank you for your answer.

NVIDIA / DALI

How to read a sequence of images with labels, and apply the same augmentation/transformation on the images in the set #3221