antoine77340 / MIL-NCE_HowTo100M

PyTorch GPU distributed training code for MIL-NCE HowTo100M
Apache License 2.0
214 stars 31 forks source link

Scale video such that the shorter side is 256 #6

Closed LuoweiZhou closed 4 years ago

LuoweiZhou commented 4 years ago

Rescaling the input resolution to match the dimensions from HowTo100M release regardless of the original video resolution.

antoine77340 commented 4 years ago

Hi Luowei,

Thanks for the suggestion, actually there is an argument in the data loader called crop_only that is True per default. If you set it to False, you will run this code instead: https://github.com/antoine77340/MIL-NCE_HowTo100M/blob/2be7ba7e19280004f1a86ee4bb9cbe33ce300657/video_loader.py#L76-L82 which is eventually very similar to what you suggest.

The reason why we have the two options is: if you download the provided videos which are already resized so that min(height, width) = 256, all you have to do is a crop that is faster to do than resizing and cropping. Otherwise, if you are working with videos of higher resolution, I suggest you select crop_only=False. This will actually center crop the video based on the full resolution and then scale the cropped to the required resolution. The difference with what you suggested is that instead, I believe it is faster to crop first and scale next because you perform less computation. (Also another second difference is that there won't be any vertical jittering but we believe this is not important due to the scale of HowTo100M as this will already provide temporal and horizontal jittering. Ultimately if you really want vertical jittering you will need to change the center crop it is the best to modify the crop argument from https://github.com/antoine77340/MIL-NCE_HowTo100M/blob/master/video_loader.py#L76-L79).

Does this make sense to you?

LuoweiZhou commented 4 years ago

Hi Antoine, thanks for the thorough response and insight. Yes, it makes sense. Performing scaling indeed brings slightly more computation (2-3% in my case with batch size 1120). I believe the video decoding could be the main speed bottleneck, even with fast hard disk access. Though I have no luck with any improvement so far (e.g., CUDA codec, DALI, decord). Please let me know if you have any suggestions.

For now, maybe we can leave a note similar to "The reason why we have the two options is: [...] I suggest you select crop_only=False."? Just for future reference in case people happen to download videos on their own (perhaps for other datasets) or use a different video resolution than min=256. I am closing the PR and thank you for the awesome repo!