fangwei123456 / spikingjelly

SpikingJelly is an open-source deep learning framework for Spiking Neural Network (SNN) based on PyTorch.
https://spikingjelly.readthedocs.io
Other
1.35k stars 239 forks source link

Error in enabling multi-step mode for single layer FC SNN on FMNIST [Latest version] #314

Open NicolaCST opened 1 year ago

NicolaCST commented 1 year ago

Issue type

SpikingJelly version 'latest'

Description

Hi, i'm trying to implement the multi-step mode on a single FC SNN. Im currently following the old tutorial (0.0.0.0.12) but im implementing it in the newest version. I can successfully train the net in single-step mode, however i got this error when switching to multi-step

--- error ---

---> 19 train_ls = only_train_multistep(net, epochs, train_data_loader, device, writer) 9 frames /content/spikingjelly/spikingjelly/activation_based/auto_cuda/base.py in append(self, codes) 1470 Append codes in self.codes. 1471 """ -> 1472 codes = codes.replace('\n', '') 1473 codes = codes.split(';') 1474 for i in range(codes.len()): AttributeError: 'NoneType' object has no attribute 'replace'

--- error ---

Since there are no tutorial available with this version, i tried to follow as close as possible the definition of the train loop and the net from the CSNN (since it's in multi-step mode)

Thanks in advance

@fangwei123456

Minimal code to reproduce the error/bug

'''

My net: class SNN(nn.Module): def init(self, T:int): super().init()

    self.T = T

    self.layer = nn.Sequential(
        layer.Flatten(),
        layer.Linear(28 * 28, 10, bias=False),
        neuron.IFNode(surrogate_function=surrogate.ATan()))

    functional.set_step_mode(self, step_mode="m")
    functional.set_backend(self, backend="cupy")

def forward(self, x:torch.Tensor):
  x_seq = x.unsqueeze(0).repeat(self.T, 1, 1, 1, 1)
  x_seq = self.layer(x_seq)
  return x_seq.mean(0)

My train loop: for epoch in range(epochs): print("Epoch {}:".format(epoch))

  train_loss = 0
  train_acc = 0
  train_samples = 0

  for img, label in train_data_loader:
      optimizer.zero_grad()

      img = img.to(device)
      label = label.to(device)
      label_onehot = F.one_hot(label, 10).float()

      if scaler is not None:
        with amp.autocast():
          encoded_img = encoder(img)
          out_fr = net(encoded_img)
          loss = F.mse_loss(out_fr, label_onehot)

        scaler.scale(loss).backward()
        scaler.step(optimizer)
        scaler.update()

      train_samples += label.numel()
      train_loss += loss.item() * label.numel()
      train_acc += (out_fr.argmax(1) == label).float().sum().item()

      functional.reset_net(net)

[.......] '''

fangwei123456 commented 1 year ago

Can you provide the minimal codes to reproduce the error? I just test the IF neuron in the master version. It will not raise errors:

from spikingjelly.activation_based import surrogate, neuron
import torch

net = neuron.IFNode(backend='cupy', step_mode='m', surrogate_function=surrogate.ATan())

x = torch.rand([8, 4], device='cuda:0', requires_grad=True)
y = net(x)
y.sum().backward()
NicolaCST commented 1 year ago

This is my net definition.

from spikingjelly.activation_based import neuron, encoding, functional, surrogate, layer
from spikingjelly import visualizing
import torch
import torch.nn as nn
import torchvision
import numpy as np
from torch.utils.tensorboard import SummaryWriter
from torch.cuda import amp
import torch.nn.functional as F

class SNN(nn.Module):
    def __init__(self, T:int):
        super().__init__()
        self.T = T

        self.layer = nn.Sequential(
            layer.Flatten(),
            layer.Linear(28 * 28, 10, bias=False),
            neuron.IFNode(surrogate_function=surrogate.ATan()))

        functional.set_step_mode(self, step_mode="m")
        functional.set_backend(self, backend="cupy")

     #----------- HERE  (2) -------- #
    def forward(self, x:torch.Tensor):
      x_seq = x.unsqueeze(0).repeat(self.T, 1, 1, 1, 1)
      x_seq = self.layer(x_seq)
      return x_seq.mean(0)

net = SNN(T=5)

The train loop is defined as:

  def only_train_multistep (net, epochs, train_data_loader, device, writer): 

    scaler = amp.GradScaler()
    functional.reset_net(net)

    print("Training new mode...")
    net.train()

    for epoch in range(epochs):
        print("Epoch {}:".format(epoch))

        train_loss = 0
        train_acc = 0
        train_samples = 0

        for img, label in train_data_loader:
            optimizer.zero_grad()
            img = img.to(device)
            label = label.to(device)
            label_onehot = F.one_hot(label, 10).float()

            if scaler is not None:
              with amp.autocast():
                #----------- HERE  (1) -------- #
                encoded_img = encoder(img)
                out_fr = net(encoded_img)
                loss = F.mse_loss(out_fr, label_onehot)

              scaler.scale(loss).backward()
              scaler.step(optimizer)
              scaler.update()

            train_samples += label.numel()
            train_loss += loss.item() * label.numel()
            train_acc += (out_fr.argmax(1) == label).float().sum().item()

            functional.reset_net(net)
        train_loss /= train_samples
        train_acc /= train_samples

  return

And then i call

writer = SummaryWriter(RUNS_PATH)
optimizer = torch.optim.Adam(net.parameters(), lr=1e-3)
encoder = encoding.PoissonEncoder()

train_data_loader = data.DataLoader(
        dataset = train_dataset,
        batch_size = 128,
        shuffle = shuffle,
        drop_last = True)

train_ls = only_train_multistep(net, epochs=10, train_data_loader, device='cuda:0', writer)

Im sorry, this is a bit verbose but should be helpful. The point that i modified are the HERE(1) - where i removed the loop over T, since it's on multistep mode - and the HERE (2) - where by following the tutorial on CSNN, i've added the dimension of T to the tensor.

By adding some prints, the error raises during the call of scaler.scale(loss).backward(). However I am not sure if the forward part has been implemented correctly

------ ERROR -------

/content/spikingjelly/spikingjelly/activation_based/auto_cuda/base.py in append(self, codes) 1470 Append codes in self.codes. 1471 """ -> 1472 codes = codes.replace('\n', '') 1473 codes = codes.split(';') 1474 for i in range(codes.len()):

AttributeError: 'NoneType' object has no attribute 'replace'

------ ERROR -------

fangwei123456 commented 1 year ago

Hi, I find this bug, which is caused by not returning. Now I have fixed it.

The tutorial for MNIST with fc SNN is available now:

https://spikingjelly.readthedocs.io/zh_CN/latest/activation_based_en/lif_fc_mnist.html