Open zimenglan-sysu-512 opened 5 years ago
This is due to a different codepath being executed in the backwards, and some hook not being triggered properly in the distributed
package.
I think the right solution here would be to make conv
, batch_norm
/ etc support empty batch sizes natively.
Another solution I believe would work is to use the legacy distributed backend.
Experiencing the same issue.
Training stops when mask proposals are empty. Any suggestions, to deal with this case?
hi @fmassa
u are right,it needs to make conv
, bn
, gn
/ etc support empty batchsizes, so i follow Conv2d
to encapsulate GN
like this:
class GroupNorm(torch.nn.GroupNorm):
def forward(self, x):
if x.numel() > 0:
return super(GroupNorm, self).forward(x)
# get output shape
output_shape = x.shape
return _NewEmptyTensorOp.apply(x, output_shape)
but it does not work.
hi @Sreehari-iitm a simple but not nice method is that we make some fake proposals to forward the mask branch and then make zero for the gradients when bp.
hi @zimenglan-sysu-512 I did this expecting the network to improvise after few iterations that it no longer gives empty mask proposals, but the network was still giving empty proposals in some iterations. I think there should be something to penalize empty proposals so that it improves over time. But still didn't figure out a way to do that
hi @Sreehari-iitm maybe try this in mask scoring rcnn to deal with empty proposals for mask branch. btw, it is in the test phase.
i find that if the proposals is empty when train the mask branch, the training procedure will hang on. how to deal with this cases?