RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:327

MrMaaoui commented 4 years ago

Good evening,

I'm trying to use DenseTorch to train a multi task module on a custom created dataset (rgb, masks and depth with the masks and depth being grayscale). whenever I'm trying to run the training I get the following error that I couldn't solve:

Traceback (most recent call last): File "train.py", line 82, in <module> dt.engine.train(model1, optims, [crit_segm, crit_depth], trainloader, loss_coeffs) File "/media/pfe_historiar/data/dataset/DenseTorch-master/densetorch/engine/trainval.py", line 80, in train target.squeeze(dim=1), File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 547, in __call__ result = self.forward(*input, **kwargs) File "/media/pfe_historiar/data/dataset/DenseTorch-master/densetorch/engine/losses.py", line 27, in forward c = 0.2 * torch.max(err) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:327

Any recommendations on how to solve this issue?

The full terminal log is in this file log.txt.

DrSleep commented 4 years ago

What is the channel dimension of your segmentation classifier (aka num_classes[0] in config) ? And what is the largest label in your segmentation masks? When computing the segmentation cross-entropy loss, the largest label must be lower than the channel dimension of the classifier, otherwise you will see that error (more info here in pytorch docs).

MrMaaoui commented 4 years ago

Thank you for your response, converting segmented ground truth into class ids solved it

DrSleep / DenseTorch

RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/THCReduceAll.cuh:327 #7