Closed kiyoon closed 3 years ago
Hi
Is there a way to resize a sequence data in DALI using GPU? If not, is there any plan to make ops.Resize for sequence data? Also, is there any workaround solution such as using ops.TorchPythonFunction?
We have the Resize operator working for sequences on our ToDo list but we cannot commit to any date yet. ops.TorchPythonFunction
is mainly for debugging and prototyping so it won't be very fast.
What will happen if the video files have different sizes? Is it possible to define a graph that processes the input one file by one and then later combine the outputs? (Or is it supposed to work like that internally?) Should I just set the batch size to 1 and wrap the dataloader, call next() multiple times and combine the outputs manually?
This is possible, although won't be very fast.
Is there any better method than pre-defining the range of the clips manually in file_list? Or is there any plan to add fixed number of uniform sampling?
If you want to sample just n-samples from K possible where K > n it is not doable in DALI now, DALI can randomly sample but it will provide you all K possible combinations, not only the subset. file_list
is some way to overcome this.
Thanks. I guess I'll have to wait for the resize operator to support sequence input.
Hi @kiyoon,
Now the resize
operator supports sequences. You can also check the video_resize
.
Hi,
I'm trying to reproduce Facebook AI research's Non-local Neural Networks and SlowFast video action recognition training and testing settings using DALI.
(Their code/papers: https://github.com/facebookresearch/SlowFast)
For training, the video's shorter side is randomly sampled in [256,320] pixels, (e.g. it will make 800x600 to 400x300 input) and crop a random position of size 224x224. This has a multiple scale augmentation effect.
For testing, the shorter side is rescaled to 256 and they crop left, centre, and right (or upper, middle, and bottom) with 256x256 size. Temporally, they sample 10 clips evenly from a full-length video.
According to them, it is a common practice for video action recognition training/testing.
Possible solution for training: DALI has an
ops.Resize
function that only works with images but not videos. If the operation supported sequence inputs, then we could use that and pass theresize_shorter
something likeops.Constant(256) + ops.Constant(320-256) * ops.Uniform()
(of course they are predefined in__init__
and you call the operations indefine_graph
.Possible solution for testing: DALI doesn't support sampling n clips evenly from an entire video. Instead, we could use
file_list
andfile_list_frame_num = True
inops.VideoReader
, and manually we need to pre-define the range of 10 clips for testing.Questions for training: Is there a way to resize a sequence data in DALI using GPU? If not, is there any plan to make
ops.Resize
for sequence data? Also, is there any workaround solution such as usingops.TorchPythonFunction
?What will happen if the video files have different sizes? Is it possible to define a graph that processes the input one file by one and then later combine the outputs? (Or is it supposed to work like that internally?) Should I just set the batch size to 1 and wrap the dataloader, call
next()
multiple times and combine the outputs manually?Questions for testing: Is there any better method than pre-defining the range of the clips manually in
file_list
? Or is there any plan to add fixed number of uniform sampling?Thank you.