The implementation of NeuNorm

fangwei123456 / spikingjelly

SpikingJelly is an open-source deep learning framework for Spiking Neural Network (SNN) based on PyTorch.

https://spikingjelly.readthedocs.io

Other

1.35k stars 239 forks source link

The implementation of NeuNorm #14

Open densechen opened 4 years ago

densechen commented 4 years ago

The implementation of NeuNorm here https://github.com/fangwei123456/spikingjelly/blob/ce6d6e6de5a6af0dd353dc3d3915718cb73b2fa1/spikingjelly/clock_driven/layer.py#L70-L71 seems different with the NeuNorm described in the original paper.

In my opinions, the trainable weight U (which named w in your code), should have the same size with FM. which means we should set w as [1, height, width], not [channesl, 1, 1]
the code in_spikes.sum(dim=1).unsqueeze(1) will generate a tensor with shape [batch_size, 1, width, height], but L71 directly multiplies self.w with self.x, which is [channels, 1, 1] x [batch_size, 1, width, height] may also be not correct.

fangwei123456 commented 4 years ago

Thanks for your code review. I fix this bug with the new code:

class NeuNorm(nn.Module):
    def __init__(self, in_channels, height, width, k=0.9):
        super().__init__()
        self.x = 0
        self.k0 = k
        self.k1 = (1 - self.k0) / in_channels**2
        self.w = nn.Parameter(torch.Tensor(in_channels, height, width))
        nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))

    def forward(self, in_spikes: torch.Tensor):
        self.x = self.k0 * self.x + self.k1 * in_spikes.sum(dim=1, keepdim=True)  # x.shape = [batch_size, 1, height, width]
        return in_spikes - self.w * self.x

It this the correct implement?

densechen commented 4 years ago

class NeuNorm(nn.Module):
    def __init__(self, in_channels, height, width, k=0.9):
        super().__init__()
        self.x = 0
        self.k0 = k
        self.k1 = (1 - self.k0) / in_channels**2
        # self.w = nn.Parameter(torch.Tensor(in_channels, height, width))
        self.w = nn.Parameter(torch.Tensor(1, 1, height, width))
        nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))

    def forward(self, in_spikes: torch.Tensor):
        self.x = self.k0 * self.x + self.k1 * in_spikes.sum(dim=1, keepdim=True)  # x.shape = [batch_size, 1, height, width]
        return in_spikes - self.w * self.x

I am not sure it is 100% correct. But, I think it may be like this.

It is better to have a test first.🤔🤔🤔

fangwei123456 commented 4 years ago

In the paper they use 'trainable weights U_{c}^{n}', so I think the shape of w in our codes is [in_channels, height, width]. What's your opinion? @Yanqi-Chen

Yanqi-Chen commented 4 years ago

class NeuNorm(nn.Module):
    def __init__(self, in_channels, height, width, k=0.9):
        super().__init__()
        self.x = 0
        self.k0 = k
        self.k1 = (1 - self.k0) / in_channels**2
        # self.w = nn.Parameter(torch.Tensor(in_channels, height, width))
        self.w = nn.Parameter(torch.Tensor(1, 1, height, width))
        nn.init.kaiming_uniform_(self.w, a=math.sqrt(5))

    def forward(self, in_spikes: torch.Tensor):
        self.x = self.k0 * self.x + self.k1 * in_spikes.sum(dim=1, keepdim=True)  # x.shape = [batch_size, 1, height, width]
        return in_spikes - self.w * self.x

I am not sure it is 100% correct. But, I think it may be like this.

It is better to have a test first.🤔🤔🤔

From the original paper (below Figure 2) we know the trainable weights $U_c^n$ have a subscript c (means channel), and the sum is applied across all channels. So in my opinion maybe the weights are not shared between channels.

densechen commented 4 years ago

in_channels or 1 will make it not share or share the weight among different channels. Both of them may be ok. 🤔 [in_channels, height, width] may be more consistend with original paper. By the way, it is really benifit for network to train a so big matrix with size [in_channels, height, width]. emmm, 😟

fangwei123456 commented 4 years ago

I have tested the memory consumption. x = torch.rand([128, 128, 128]).to('cuda:0') will consume 1G memory. The memory consumption is so large. Hence, I change the codes with an additional parameter shared_across_channels.

        if shared_across_channels:
            self.w = nn.Parameter(torch.Tensor(1, height, width))
        else:
            self.w = nn.Parameter(torch.Tensor(in_channels, height, width))

The supplementary material of this paper says that they trained their network with GTX1060 GPU. But I think the GTX1060 is not qualified to such a huge memory consumption... Maybe you can email to the author and ask whether this parameter is shared across channels?

densechen commented 4 years ago

The author have released a code here. However, they did not provide the implementation of NeuNorm. issue