Closed AhmedFrikha closed 3 years ago
Thanks for your questions. Different from other network architectures, ResNet is a special case that uses the average pool at the last conv layer. We use the current implementation for convenience so that all networks can share the same implementation. Feel free to use your implementation or downsample spatial middle layer gradient as guidance.
Can you please elaborate more on the question (1) ? I still don't understand why the spatial-wise RSC was implemented as a sum of the activations (x_new) weighted by the mean gradients of each channel (spatial_mean = torch.sum(x_new * grad_channel_mean, 1)).
Based on the description in the paper ("global average pooling is applied along the channel dimension to the gradient tensor G to produce a weighting matrix wi of size [7 × 7]."), instead of lines 99-103 it should be just: spatial_mean = torch.mean(grads_val.view(num_rois, num_channel, -1), dim=1).
Originally posted by @AhmedFrikha in https://github.com/DeLightCMU/RSC/issues/15#issuecomment-796242412