Closed aggarwal-himanshu closed 7 years ago
On my machine with a 1080 Ti, it takes around 7--8 minutes for a training epoch. If you need faster speeds, you can look into average-pooling the 14x14 feature maps into 7x7 ones as another pre-processing step, suggested in https://arxiv.org/abs/1708.02711 (I haven't checked how this affects accuracies with this code), or use the object proposal features from https://github.com/peteanderson80/bottom-up-attention (this will give you better accuracies, even if using only 36 proposals per image).
Ohh thats a pretty huge difference, do you have any idea why this could be happening? Also during training i see GPU volatile usage to be 0% most of the time (but not all the time, I have seen it go to 99% also ). Should I increase the data workers?
If some but not all CPU cores are maxed out, you can try increasing the number of workers. Otherwise (if none of them are maxed out) it's probably I/O limited. In that case, you have to look into using faster disks (having the data on an SSD is a good idea), more memory (for disk caching), or one of the options I mentioned above.
Thanks a lot. I guess its I/O. Cpu cores are not going more than 10 %. My google cloud VM have hdds, and I guess the fact they are on cloud and not physically attached, makes it even worse. Thanks anyway Yan, your code is very cleanly implemented. I am experimenting if using Glove embeddings for encoding questions rather than index based approach will be any helpful.
Hi,
Thank you so much for providing the code. Can you please give me some indication about the training time. On my google VM with Tesla k80 its takes around 35 mins for one train epoch. Is that what you faced?
Thanks