chvlyl / ISIC2018

Lesion attributes segmentation for melanoma detection with multi-task U-Net
MIT License
44 stars 11 forks source link

RuntimeError: max_pool2d_with_indices_out_cuda_frame failed with error code 0 #9

Closed MRtianyanxiaobai closed 4 years ago

MRtianyanxiaobai commented 4 years ago

Hi,there are some errors when I test your model.pt。I don't know what I should do.can you help me?

The test code is as follows

image_path="/kaggle/input/isic2018-200-pics/ISIC2018_Task1-2_Training_Input100/"
output_path="/kaggle/working/data"
temp_path="/kaggle/working/temp"
model_weight="/kaggle/input/isicmodelunet/model.pt"
model='UNet16'
model = UNet16(num_classes=5, pretrained='vgg')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model = nn.DataParallel(model)
state = torch.load(model_weight)
model.load_state_dict(state['model'])

print('load model weight')

image_ids = sorted([fname.split('/')[-1].split('.')[0] for fname in glob.glob(os.path.join(image_path, '*.jpg'))])
data_set = TestDataset(image_ids, image_path)
test_loader = DataLoader(data_set, batch_size=1, shuffle=False, num_workers=10, pin_memory=False)
for img_id, test_image, W, H in test_loader:
      test_image = test_image.to(device)  # [N, 1, H, W]
      test_image = test_image.permute(0, 3, 1, 2)
      outputs, outputs_mask_ind1, outputs_mask_ind2 = model(test_image)
      break

the error as follows:


load model weight
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-58-cd870e1e196b> in <module>
     19       test_image = test_image.to(device)  # [N, 1, H, W]
     20       test_image = test_image.permute(0, 3, 1, 2)
---> 21       outputs, outputs_mask_ind1, outputs_mask_ind2 = model(test_image)
     22       break
     23 

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
    148         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
    149         if len(self.device_ids) == 1:
--> 150             return self.module(*inputs[0], **kwargs[0])
    151         replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
    152         outputs = self.parallel_apply(replicas, inputs, kwargs)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

<ipython-input-44-863006fd5294> in forward(self, x)
     65         conv2 = self.conv2(self.pool(conv1))
     66         conv3 = self.conv3(self.pool(conv2))
---> 67         conv4 = self.conv4(self.pool(conv3))
     68         conv5 = self.conv5(self.pool(conv4))
     69         center = self.center(self.pool(conv5))

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/pooling.py in forward(self, input)
    139         return F.max_pool2d(input, self.kernel_size, self.stride,
    140                             self.padding, self.dilation, self.ceil_mode,
--> 141                             self.return_indices)
    142 
    143 

/opt/conda/lib/python3.6/site-packages/torch/_jit_internal.py in fn(*args, **kwargs)
    179             return if_true(*args, **kwargs)
    180         else:
--> 181             return if_false(*args, **kwargs)
    182 
    183     if if_true.__doc__ is None and if_false.__doc__ is not None:

/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py in _max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode, return_indices)
    486         stride = torch.jit.annotate(List[int], [])
    487     return torch.max_pool2d(
--> 488         input, kernel_size, stride, padding, dilation, ceil_mode)
    489 
    490 max_pool2d = boolean_dispatch(

RuntimeError: max_pool2d_with_indices_out_cuda_frame failed with error code 0
···
MRtianyanxiaobai commented 4 years ago

I run the model on a single GPU in the website named kaggle

But, when I feed a randn data to the model,the code will run successfully。

The code is as follows:

model_weight="/kaggle/input/isicmodelunet/model.pt"
model='UNet16'
model = UNet16(num_classes=5, pretrained='vgg')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model = nn.DataParallel(model)
# print('load model weight')
state = torch.load(model_weight)
model.load_state_dict(state['model'])
image_path="/kaggle/input/isic2018-200-pics/ISIC2018_Task1-2_Training_Input100/"
output_path="/kaggle/working/data"
temp_path="/kaggle/working/temp"
model_weight="/kaggle/input/isicmodelunet/model.pt"
outputs, outputs_mask_ind1, outputs_mask_ind2  = model(torch.randn(1,3,512,512))

outputs

out:

tensor([[[[-43.0477, -42.6166, -42.1554,  ..., -40.7616, -46.0165, -35.7316],
          [-42.6724, -36.9382, -37.2065,  ..., -46.5197, -25.6934, -33.3851],
          [-54.9793, -45.7430, -50.2827,  ..., -56.3474, -37.1911, -37.2745],
          ...,
          [-19.8214, -21.4957, -20.9718,  ..., -38.4172, -27.2621, -26.5380],
          [-11.9154, -14.1390, -25.0464,  ..., -42.5311, -20.7334, -42.3800],
          [ -5.2952,  -9.1634, -30.1492,  ..., -27.0054, -25.9774, -31.9956]],

         [[-67.1238, -58.1360, -55.0164,  ..., -73.4346, -63.1921, -42.0327],
          [-54.6747, -43.5799, -53.0808,  ..., -53.8588, -42.6193, -55.7634],
          [-80.6789, -53.8275, -51.7053,  ..., -62.5470, -59.7515, -33.9387],
          ...,
          [-31.5516, -41.2620, -37.7773,  ..., -34.9774, -39.6705, -38.8699],
          [-21.1549, -32.8000, -42.8895,  ..., -69.2562, -56.3974, -42.5382],
          [-10.7654, -24.0475, -41.0419,  ..., -49.6977, -38.3294, -28.3830]],

         [[-51.1715, -64.2330, -62.6274,  ..., -73.8082, -69.7627, -48.6689],
          [-56.4276, -52.1652, -56.8904,  ..., -58.4550, -42.9378, -52.1528],
          [-60.8366, -59.3487, -62.2661,  ..., -74.1803, -62.2516, -52.7751],
          ...,
          [-27.9837, -40.2944, -36.6337,  ..., -36.5967, -42.6729, -41.6979],
          [-20.8316, -30.0016, -41.2409,  ..., -79.8009, -56.2778, -66.8533],
          [-12.6342, -23.8113, -43.1979,  ..., -49.1028, -39.9504, -35.4141]],

         [[-65.1302, -42.0562, -38.4903,  ..., -54.6637, -60.4237, -38.9189],
          [-56.4720, -38.4822, -49.5301,  ..., -55.9520, -64.9863, -52.3775],
          [-69.0666, -52.2756, -51.4572,  ..., -65.6839, -57.4808, -43.4866],
          ...,
          [-27.2962, -36.9498, -35.8066,  ..., -40.5169, -46.5388, -36.7875],
          [ -9.5541, -28.3006, -31.1832,  ..., -56.6378, -42.2730, -29.0847],
          [ -1.9310, -11.9232, -28.0213,  ..., -46.1306, -35.1942, -21.7377]],

         [[-54.5243, -57.9260, -48.9540,  ..., -67.0493, -53.7251, -33.6795],
          [-61.8979, -54.9488, -67.8699,  ..., -60.0295, -47.2000, -42.3213],
          [-70.2038, -66.3495, -66.4644,  ..., -63.0919, -54.1307, -41.0249],
          ...,
          [-31.5623, -46.6198, -43.6660,  ..., -39.8562, -38.2402, -31.8772],
          [-20.4796, -30.7044, -38.3722,  ..., -67.8465, -45.0856, -46.8987],
          [-12.3139, -24.7006, -41.8787,  ..., -59.7506, -43.7600, -33.7154]]]],
       device='cuda:0', grad_fn=<CudnnConvolutionBackward>)
jiangyanjavawin commented 4 years ago

Hi,there are some errors when I test your model.pt。I don't know what I should do.can you help me?

The test code is as follows

image_path="/kaggle/input/isic2018-200-pics/ISIC2018_Task1-2_Training_Input100/"
output_path="/kaggle/working/data"
temp_path="/kaggle/working/temp"
model_weight="/kaggle/input/isicmodelunet/model.pt"
model='UNet16'
model = UNet16(num_classes=5, pretrained='vgg')
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model = nn.DataParallel(model)
state = torch.load(model_weight)
model.load_state_dict(state['model'])

print('load model weight')

image_ids = sorted([fname.split('/')[-1].split('.')[0] for fname in glob.glob(os.path.join(image_path, '*.jpg'))])
data_set = TestDataset(image_ids, image_path)
test_loader = DataLoader(data_set, batch_size=1, shuffle=False, num_workers=10, pin_memory=False)
for img_id, test_image, W, H in test_loader:
      test_image = test_image.to(device)  # [N, 1, H, W]
      test_image = test_image.permute(0, 3, 1, 2)
      outputs, outputs_mask_ind1, outputs_mask_ind2 = model(test_image)
      break

the error as follows:

load model weight
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-58-cd870e1e196b> in <module>
     19       test_image = test_image.to(device)  # [N, 1, H, W]
     20       test_image = test_image.permute(0, 3, 1, 2)
---> 21       outputs, outputs_mask_ind1, outputs_mask_ind2 = model(test_image)
     22       break
     23 

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
    148         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
    149         if len(self.device_ids) == 1:
--> 150             return self.module(*inputs[0], **kwargs[0])
    151         replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
    152         outputs = self.parallel_apply(replicas, inputs, kwargs)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

<ipython-input-44-863006fd5294> in forward(self, x)
     65         conv2 = self.conv2(self.pool(conv1))
     66         conv3 = self.conv3(self.pool(conv2))
---> 67         conv4 = self.conv4(self.pool(conv3))
     68         conv5 = self.conv5(self.pool(conv4))
     69         center = self.center(self.pool(conv5))

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    530             result = self._slow_forward(*input, **kwargs)
    531         else:
--> 532             result = self.forward(*input, **kwargs)
    533         for hook in self._forward_hooks.values():
    534             hook_result = hook(self, input, result)

/opt/conda/lib/python3.6/site-packages/torch/nn/modules/pooling.py in forward(self, input)
    139         return F.max_pool2d(input, self.kernel_size, self.stride,
    140                             self.padding, self.dilation, self.ceil_mode,
--> 141                             self.return_indices)
    142 
    143 

/opt/conda/lib/python3.6/site-packages/torch/_jit_internal.py in fn(*args, **kwargs)
    179             return if_true(*args, **kwargs)
    180         else:
--> 181             return if_false(*args, **kwargs)
    182 
    183     if if_true.__doc__ is None and if_false.__doc__ is not None:

/opt/conda/lib/python3.6/site-packages/torch/nn/functional.py in _max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode, return_indices)
    486         stride = torch.jit.annotate(List[int], [])
    487     return torch.max_pool2d(
--> 488         input, kernel_size, stride, padding, dilation, ceil_mode)
    489 
    490 max_pool2d = boolean_dispatch(

RuntimeError: max_pool2d_with_indices_out_cuda_frame failed with error code 0
···

I have the same error,Have you find the solution to deal with it?thank you.

chvlyl commented 4 years ago

Sorry for the late reply. Somehow I didn't get the issue notifications. I will look into this problem.

chvlyl commented 4 years ago

I tested the trained model with the following code python3 submission.py --image-path data/ISIC2018_Task1-2_Test_Input --model-weight model/model.pt --output-path prediction

It works fine without any problem. Can you test it with the above code? Just give your test image folder to --image-path.

chvlyl commented 4 years ago

This link may be helpful

Try replace the following code test_image = test_image.permute(0, 3, 1, 2) with test_image = test_image.permute(0, 3, 1, 2).contiguous()