Open densechen opened 4 years ago
Thanks for your code review. I fix this bug with the new code:
class NeuNorm(nn.Module):
def __init__(self, in_channels, height, width, k=0.9):
super().__init__()
self.x = 0
self.k0 = k
self.k1 = (1 - self.k0) / in_channels**2
self.w = nn.Parameter(torch.Tensor(in_channels, height, width))
nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))
def forward(self, in_spikes: torch.Tensor):
self.x = self.k0 * self.x + self.k1 * in_spikes.sum(dim=1, keepdim=True) # x.shape = [batch_size, 1, height, width]
return in_spikes - self.w * self.x
It this the correct implement?
class NeuNorm(nn.Module):
def __init__(self, in_channels, height, width, k=0.9):
super().__init__()
self.x = 0
self.k0 = k
self.k1 = (1 - self.k0) / in_channels**2
# self.w = nn.Parameter(torch.Tensor(in_channels, height, width))
self.w = nn.Parameter(torch.Tensor(1, 1, height, width))
nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))
def forward(self, in_spikes: torch.Tensor):
self.x = self.k0 * self.x + self.k1 * in_spikes.sum(dim=1, keepdim=True) # x.shape = [batch_size, 1, height, width]
return in_spikes - self.w * self.x
I am not sure it is 100% correct. But, I think it may be like this.
It is better to have a test first.🤔🤔🤔
In the paper they use 'trainable weights U_{c}^{n}', so I think the shape of w in our codes is [in_channels, height, width]. What's your opinion? @Yanqi-Chen
class NeuNorm(nn.Module): def __init__(self, in_channels, height, width, k=0.9): super().__init__() self.x = 0 self.k0 = k self.k1 = (1 - self.k0) / in_channels**2 # self.w = nn.Parameter(torch.Tensor(in_channels, height, width)) self.w = nn.Parameter(torch.Tensor(1, 1, height, width)) nn.init.kaiming_uniform_(self.w, a=math.sqrt(5)) def forward(self, in_spikes: torch.Tensor): self.x = self.k0 * self.x + self.k1 * in_spikes.sum(dim=1, keepdim=True) # x.shape = [batch_size, 1, height, width] return in_spikes - self.w * self.x
I am not sure it is 100% correct. But, I think it may be like this.
It is better to have a test first.🤔🤔🤔
From the original paper (below Figure 2) we know the trainable weights $U_c^n$ have a subscript c (means channel), and the sum is applied across all channels. So in my opinion maybe the weights are not shared between channels.
in_channels or 1 will make it not share or share the weight among different channels. Both of them may be ok. 🤔 [in_channels, height, width] may be more consistend with original paper. By the way, it is really benifit for network to train a so big matrix with size [in_channels, height, width]. emmm, 😟
I have tested the memory consumption.
x = torch.rand([128, 128, 128]).to('cuda:0')
will consume 1G memory. The memory consumption is so large.
Hence, I change the codes with an additional parameter shared_across_channels
.
if shared_across_channels:
self.w = nn.Parameter(torch.Tensor(1, height, width))
else:
self.w = nn.Parameter(torch.Tensor(in_channels, height, width))
The supplementary material of this paper says that they trained their network with GTX1060 GPU. But I think the GTX1060 is not qualified to such a huge memory consumption... Maybe you can email to the author and ask whether this parameter is shared across channels?
The implementation of NeuNorm here https://github.com/fangwei123456/spikingjelly/blob/ce6d6e6de5a6af0dd353dc3d3915718cb73b2fa1/spikingjelly/clock_driven/layer.py#L70-L71 seems different with the NeuNorm described in the original paper.
in_spikes.sum(dim=1).unsqueeze(1)
will generate a tensor with shape [batch_size, 1, width, height], but L71 directly multiplies self.w with self.x, which is [channels, 1, 1] x [batch_size, 1, width, height] may also be not correct.