Closed sqiangcao99 closed 3 years ago
In order to speed up the training, you could increase the batch size until filling up the whole GPU memory. However, I found out that a larger batch size was causing more overfitting, so I only recommend it for training on a large dataset (like HowTo100M). Probably it is possible to mitigate the overfitting problem with more regularisation or a different learning rate decay, but I did not experiment with that, sorry.
Thank you for your quick response.
Hi. Thank you for generously sharing your work. When I trained the model on MSRVTT with 1 V100, I found the GPU-Util cannot reach 100%(about 60%). Do you have some tips? Thank you.