Mephisto405 / Learning-Loss-for-Active-Learning

Reproducing experimental results of LL4AL [Yoo et al. 2019 CVPR]
215 stars 50 forks source link

Question about experiment on CIFAR-100 #14

Closed hashnut closed 3 years ago

hashnut commented 3 years ago

Hello, I read your code and it was really great.

Test on CIFAR-10 was successful, though, I tried to train the model on CIFAR-100, but it get stuck on this point(Main.py) :

        # Model
        resnet18    = resnet.ResNet18(num_classes=100).cuda() # On this point
        loss_module = lossnet.LossNet().cuda()
        models      = {'backbone': resnet18, 'module': loss_module}
        torch.backends.cudnn.benchmark = False

Runtime Error says this this :

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-10-e3617788a34b> in <module>()
     21 
     22         # Model
---> 23         resnet18    = resnet.ResNet18(num_classes=100).cuda()
     24         loss_module = lossnet.LossNet().cuda()
     25         models      = {'backbone': resnet18, 'module': loss_module}

3 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in cuda(self, device)
    489             Module: self
    490         """
--> 491         return self._apply(lambda t: t.cuda(device))
    492 
    493     def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    385     def _apply(self, fn):
    386         for module in self.children():
--> 387             module._apply(fn)
    388 
    389         def compute_should_use_set_data(tensor, tensor_applied):

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _apply(self, fn)
    407                 # `with torch.no_grad():`
    408                 with torch.no_grad():
--> 409                     param_applied = fn(param)
    410                 should_use_set_data = compute_should_use_set_data(param, param_applied)
    411                 if should_use_set_data:

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in <lambda>(t)
    489             Module: self
    490         """
--> 491         return self._apply(lambda t: t.cuda(device))
    492 
    493     def xpu(self: T, device: Optional[Union[int, device]] = None) -> T:

RuntimeError: CUDA error: device-side assert triggered

How can I fix this RuntimeError?

I'm running this code on google colab

Mephisto405 commented 3 years ago

If you speak Korean, this blog would be useful. For me, it seems like an out-of-memory.

hashnut commented 3 years ago

Thanks!

I changed ( ) to [ ] in T.Normalize([0.4914, 0.4822, 0.4465], [0.2023, 0.1994, 0.2010]) # T.Normalize((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761)) # CIFAR-100

in Main.py, and it worked :)

Mephisto405 commented 3 years ago

The problem resolved, thus we close this issue.

Thank you very much for your contribution!!