Hi, Mathias!
Please, look at the authors' implementation in PyTorch:
_self.slide_winsize = self.weight_maskUpdater.shape[1] self.weight_maskUpdater.shape[2] self.weightmaskUpdater.shape[3]
and compare with your current
_self.window_size = self.kernel_size[0] * self.kernelsize[1]
According to the paper, the scaling factor for all valid (unmasked) pixels in window is 1, but in your case it is 1/(input channels dimension).
I don't think it's a big problem, this additional multiplier is constant for each layer and can be learned by the network, or even batch norm negates it :)
But if you make proposed changes to the code, old trained weights will be invalid.
Hi, Mathias! Please, look at the authors' implementation in PyTorch: _self.slide_winsize = self.weight_maskUpdater.shape[1] self.weight_maskUpdater.shape[2] self.weightmaskUpdater.shape[3] and compare with your current _self.window_size = self.kernel_size[0] * self.kernelsize[1] According to the paper, the scaling factor for all valid (unmasked) pixels in window is 1, but in your case it is 1/(input channels dimension). I don't think it's a big problem, this additional multiplier is constant for each layer and can be learned by the network, or even batch norm negates it :) But if you make proposed changes to the code, old trained weights will be invalid.