RuntimeError: The size of tensor a (96774) must match the size of tensor b (290322) at non-singleton dimension 0

Angus-Lee commented 4 years ago

I try to run your code in a new dataset ,when I execute train_deep_globe_global.sh,the following error occured:

/home/###/anaconda3/envs/pytorch_py366/lib/python3.6/site-packages/torch/nn/functional.py:2423: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) Traceback (most recent call last): File "train_deep_globe.py", line 118, in loss = trainer.train(sample_batched, model, global_fixed) File "/home/boyun066/Desktop/Semantic_Segmentation/GLNet/GLNet-master/helper.py", line 329, in train loss = self.criterion(outputs_global, labels_glb) File "train_deep_globe.py", line 97, in criterion = lambda x,y: criterion1(x, y) File "/home/boyun066/anaconda3/envs/pytorch_py366/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/boyun066/Desktop/Semantic_Segmentation/GLNet/GLNet-master/utils/loss.py", line 57, in forward probs = (probs target).sum(1) RuntimeError: The size of tensor a (96774) must match the size of tensor b (290322) at non-singleton dimension 0

KiyoKando commented 4 years ago

Hi, your error seems to be the same as I am tackling here below. https://github.com/TAMU-VITA/GLNet/issues/14#issuecomment-546715827

About "(96774) must match the size of tensor b (290322)", I guess you are having extra 3 inside labels_glb.size() : torch.Size([6, 127, 127, 3]). For that, I am debugging with advice from the author Wuyang. https://github.com/TAMU-VITA/GLNet/issues/14#issuecomment-546715827

Angus-Lee commented 4 years ago

Hi, your error seems to be the same as I am tackling here below. #14 (comment)

About "(96774) must match the size of tensor b (290322)", I guess you are having extra 3 inside labels_glb.size() : torch.Size([6, 127, 127, 3]). For that, I am debugging with advice from the author Wuyang. #14 (comment)

See also,

6

thanks~ I finally solving this problem by translate the training label from RGB to L mode using a self-defined mapping, thus the label have only one channel

KiyoKando commented 4 years ago

Wow, glad to hear you went through! (and do you mind if I favor knowing how you changed details ><...?)

KiyoKando commented 4 years ago

Ah, thanks, I added two lines after L293 and worked: https://github.com/TAMU-VITA/GLNet/blob/master/helper.py#L293

for labels in range(len(labels_glb)): labels_glb[labels] = labels_glb[labels].convert('L')

did you face “RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED error.” after this?

chenwydj commented 4 years ago

You could try to comment out these two lines where the cudnn.deterministic feature is enabled. https://github.com/TAMU-VITA/GLNet/blob/a8132640772e4ed3f6ea29a3ab03dc467df2fb66/train_deep_globe.py#L25 https://github.com/TAMU-VITA/GLNet/blob/a8132640772e4ed3f6ea29a3ab03dc467df2fb66/helper.py#L19

GeneralLi95 commented 4 years ago

Hi, your error seems to be the same as I am tackling here below. #14 (comment) About "(96774) must match the size of tensor b (290322)", I guess you are having extra 3 inside labels_glb.size() : torch.Size([6, 127, 127, 3]). For that, I am debugging with advice from the author Wuyang. #14 (comment) See also,

6

thanks~ I finally solving this problem by translate the training label from RGB to L mode using a self-defined mapping, thus the label have only one channel

I encountered the same problem. Could you please share more details about your change? Thank you.

GeneralLi95 commented 4 years ago

@KiyoKando Have you solved the RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED error, I comment out two lines as @chenwydj said, but it didn't work.

KiyoKando commented 4 years ago

@KiyoKando Have you solved the RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED error, I comment out two lines as @chenwydj said, but it didn't work.

Hi, thanks for reaching out but actually not. I had another error code after revising like below. https://github.com/TAMU-VITA/GLNet/blob/master/helper.py#L293

The author said that it is colab-related error, so if you had a breakthrough, I'd also love to know more about it.

GeneralLi95 commented 4 years ago

@KiyoKando This is not a CUDA version problem. The key point is that the _mask.png has 3 channels( RGB). We should change the into one channel. You use convert('L') in helper.py and it cause the cudnn error. We should complete this convert with a independent method. I have solved this and will open soure my code soon.

heyuemao commented 3 years ago

@KiyoKando This is not a CUDA version problem. The key point is that the _mask.png has 3 channels( RGB). We should change the into one channel. You use convert('L') in helper.py and it cause the cudnn error. We should complete this convert with a independent method. I have solved this and will open soure my code soon.

Hello, could I know how you solve this problem with an independent method?

VITA-Group / GLNet

RuntimeError: The size of tensor a (96774) must match the size of tensor b (290322) at non-singleton dimension 0 #17

6

6