Closed lingyunwu14 closed 4 years ago
Sorry for late reply. Increase the batch size of each GPU is not easy to implement because it's hard to reformat the relation module to support that. And though here I say 1 GPU could only hold 1 image, actually multiple images (the key frame and relevant frames) would be processed simultaneously. Here 1 just refer to the number of key frames. And about the backbone, I do not make any modification about configs with backbone, they are kept untouched so how it behaves should be the same as in the image detection domain. I don't know whether fixing all bn layers would affect the performance :)
Yes, I know that "1" just refers to the number of keyframes. But the reference frames do not directly participate in the loss calculation, in other words, the model only backpropagates on the key frame. This still means that the batch size of each GPU is 1. Actually, based on your code, I reproduced the method of Sequence-Level-Semantics-Aggregation. The training is unstable when the batch size is 1, so I want to increase the batch size of each GPU. If I use the RPNWithRefModule class, can you specify where only a single picture is supported in RPN modules?
Thanks for your excellent work and nice open-source code. "Currently, 1 GPU could only hold 1 image." Is it possible to increase the batch size of each GPU? In addition, I noticed that all bn layer is fixed, is it related to the batch size of the single GPU is 1? but only the backbone is pre-trained from ImageNet, does fixing all bn layers affect the performance of the model?