Closed Michelvl92 closed 3 years ago
Hi @Michelvl92,
In your case you can either use multiple readers.file
, each of it would read one frame from each sequence. To do that, pass to files
argument list: reader0, files=[seq0_frame0, seq1_frame0...], reader1, files=[seq0_frame1, seq1_frame1...] (the same do for labels
argument), then use stack/cat operator to create sequences.
You can also use the external source operator, you can check parallel
and prefetch_queue_depth
arguments to speed things up (there are examples that soon should be available as a part of the documentation, you can preview them in https://github.com/NVIDIA/DALI/pull/3199).
On top of this, is it possible to apply transformations or augmentation like nvidia.dali.fn.flip(), nvidia.dali.fn.transforms.shear()on the sequence such that every frame in the sequence undergoes the same augmentation/transformation?
If you have each frame as a separate output you can do something:
flip = fn.random.coin_flip()
fame0 = fn.flip(frame0, vertical=flip)
fame1 = fn.flip(frame1, vertical=flip)
Or
flip = fn.random.coin_flip()
frames = fn.flip([frame0, frame1], vertical=flip)
Or (as most operators support sequences)
flip = fn.random.coin_flip()
sequence = fn.stack(frame0, frame1)
frames = fn.flip(sequence , vertical=flip)
Thank you for your comment, this explains a clear solution for both.
Why is there no option in readers.sequnce
to include labels, is this not something you always want for training?
For the readers.file
solution, this means creating the same no. of readers.file
as sequences as I will need, and could be (not sure if you agree) an ugly solution. (how) Will having multiple readers.file
have an impact on the performance (speed), but also on memory utilization?
Hi @Michelvl92,
The implementation of the readers.sequnce
is rudimentary and we don't have plans to develop it further in the near future, however if you want you can try extending it to your needs.
For the readers.file solution, this means creating the same no. of readers.file as sequences as I will need, and could be (not sure if you agree) an ugly solution. (how) Will having multiple readers.file have an impact on the performance (speed), but also on memory utilization?
The reading process should not have much impact on the performance, and the memory consumption should be similar to having a corresponding solution that would have read sequences instead of files. However creating it would require creating multiple decoder instances later on, and if you use a mixed
backend it can consume a lot of memory in such case. Still, I don't think there is any better solution available in DALI for now.
@JanuszL thank you for your answer.
Hi all, my question is what is the best (and fastest way) to read image sequences like is shown here for the
readers.sequence
Reading Video Frames Stored as Images, but with labels. for each sequence?My dataset: is an action classification dataset, that where sequences of images are labeled as an action, e.g. run, throw, dance, etc. The labels for the action can be stored in any way (I am very flexible in that), e.g. based on the subfolder (name), best on .txt file, etc. Eventually, it would be possible to store the image sequences as tfrecord or NumPy array (but I would like to keep as much as possible the option to e.g. skip frames and/or take a shorter sequence).
I researched for some possibilities, but all of them lack the perfect solution:
readers.sequnce
readers.file
readers.tfrecord
readers.numpy
ExternalSource Operator
On top of this, is it possible to apply transformations or augmentation like
nvidia.dali.fn.flip()
,nvidia.dali.fn.transforms.shear()
on the sequence such that every frame in the sequence undergoes the same augmentation/transformation?