Train scale jittering from video data.

kiyoon commented 4 years ago

Hi,

I'm trying to reproduce Facebook AI research's Non-local Neural Networks and SlowFast video action recognition training and testing settings using DALI.

(Their code/papers: https://github.com/facebookresearch/SlowFast)

For training, the video's shorter side is randomly sampled in [256,320] pixels, (e.g. it will make 800x600 to 400x300 input) and crop a random position of size 224x224. This has a multiple scale augmentation effect.

For testing, the shorter side is rescaled to 256 and they crop left, centre, and right (or upper, middle, and bottom) with 256x256 size. Temporally, they sample 10 clips evenly from a full-length video.

According to them, it is a common practice for video action recognition training/testing.

Possible solution for training: DALI has an ops.Resize function that only works with images but not videos. If the operation supported sequence inputs, then we could use that and pass the resize_shorter something like ops.Constant(256) + ops.Constant(320-256) * ops.Uniform() (of course they are predefined in __init__ and you call the operations in define_graph.

Possible solution for testing: DALI doesn't support sampling n clips evenly from an entire video. Instead, we could use file_list and file_list_frame_num = True in ops.VideoReader, and manually we need to pre-define the range of 10 clips for testing.

Questions for training: Is there a way to resize a sequence data in DALI using GPU? If not, is there any plan to make ops.Resize for sequence data? Also, is there any workaround solution such as using ops.TorchPythonFunction?

What will happen if the video files have different sizes? Is it possible to define a graph that processes the input one file by one and then later combine the outputs? (Or is it supposed to work like that internally?) Should I just set the batch size to 1 and wrap the dataloader, call next() multiple times and combine the outputs manually?

Questions for testing: Is there any better method than pre-defining the range of the clips manually in file_list? Or is there any plan to add fixed number of uniform sampling?

Thank you.

JanuszL commented 4 years ago

Hi

Is there a way to resize a sequence data in DALI using GPU? If not, is there any plan to make ops.Resize for sequence data? Also, is there any workaround solution such as using ops.TorchPythonFunction?

We have the Resize operator working for sequences on our ToDo list but we cannot commit to any date yet. ops.TorchPythonFunction is mainly for debugging and prototyping so it won't be very fast.

What will happen if the video files have different sizes? Is it possible to define a graph that processes the input one file by one and then later combine the outputs? (Or is it supposed to work like that internally?) Should I just set the batch size to 1 and wrap the dataloader, call next() multiple times and combine the outputs manually?

This is possible, although won't be very fast.

Is there any better method than pre-defining the range of the clips manually in file_list? Or is there any plan to add fixed number of uniform sampling?

If you want to sample just n-samples from K possible where K > n it is not doable in DALI now, DALI can randomly sample but it will provide you all K possible combinations, not only the subset. file_list is some way to overcome this.

kiyoon commented 4 years ago

Thanks. I guess I'll have to wait for the resize operator to support sequence input.

JanuszL commented 3 years ago

Hi @kiyoon,

Now the resize operator supports sequences. You can also check the video_resize.

NVIDIA / DALI

Train scale jittering from video data. #1800