Open XinyuSun opened 2 years ago
Avg GPU utilization is relatively low compared with other video pretraining methods
Thanks
Hi, the author only use the audio model during pretraining, for fair comparison with other SOTAs they did not use audio for finetuning.
Hi authors! Thank you for making the paper and code open source. It is very helpful. I am trying to pretrain the GDT model on kinetics400 dataset, while I spent more than 1 day on each epoch. I run on the 8 3090 GPU server and set the batch size on each GPU to 16, and the total batch size is 128, which is a quarter of the original setting in the paper. According to the paper, the authors spent 3 days on pretraining with 512 batch size, under normal circumstances it should not cost more than 3 hours on each epoch. I change the video decode method from
pyav
todecord
, which brings a bit of improvement in training speed. I wonder if the speed of the provided code is tested before release? What should I do to find the cues for speeding up training?Some logs below:
Sincerely yours.