Closed ShiroKL closed 4 years ago
If you're running quick finetuning experiments, then all you need to do is make a new Dataset class that returns a (3, T, 224, 224) tensor. As long as you use the same clip transform there won't be any inconsistencies. There's a utils/train.py
to help set up training.
If you're training I3D models from scratch on large datasets, you probably need to be careful about learning rate schedules and initializations. I highly recommend using a dedicated framework for this sort of stuff (e.g. mmaction)
Hi,
Thank for the answer. I am trying with UCF101 and it seems to work with few modifications. In the code, why did you create gtransforms function ? transform functions from torchvision are not good ?
gtransforms just invokes torchvision transforms but for multiple frames. Some transforms need to be consistent for all frames of the clip (e.g. horizontal flip ALL frames or no frames; random crop the same window in each frame). It is taken almost entirely from here.
I see thank you for the information. I have a last question. I wanted to remove the "Loopad" transformation which duplicates some frames when the number of frames from a video is lower than the number we want . I am using a batch size of 1. But, the behavior is weird as the step take more time without the loopad than with the loopad. Do you know why ? (in both case I am using the same batch size, the prediction with forward seems slower)
The issue came from cudnn benchmark
Great! Glad you could resolve it.
Hi,
thank you for sharing. I would like to train on custom dataset, and I was wondering what are the modification needed to do that ? Should I change only the csv path in kinetics.py or are there some others files to modify ?