error while training - Githubissues

vcvishal commented 5 years ago

i am using carvana dataset for training in which images are .jpg and labels are png i encountered this problem

<PIL.PngImagePlugin.PngImageFile image mode=L size=1918x1280 at 0x22C0868E7F0> Traceback (most recent call last): File "pytorch_run.py", line 300, in s_label = data_transform(imlabel) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\transforms\transforms.py", line 61, in call img = t(img) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\transforms\transforms.py", line 164, in call return F.normalize(tensor, self.mean, self.std, self.inplace) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\transforms\functional.py", line 208, in normalize tensor.sub(mean[:, None, None]).div_(std[:, None, None]) RuntimeError: output with shape [1, 1280, 1918] doesn't match the broadcast shape [3, 1280, 1918]

please guide

bigmb commented 5 years ago

Are you training on a single channel image?? Or using grayscale in data transform?

vcvishal commented 5 years ago

thank you for response no, I am using Carvana dataset provided by kaggle

bigmb commented 5 years ago

Did you by any chance give the training data instead of label in testing folder??

vcvishal commented 5 years ago

it might possibly, the problem is that what are the images should I put in below folders

test_image = '' #Image to be predicted while training test_label = '' #Label of the prediction Image test_folderP = '' #Test folder Image test_folderL = '' #Test folder Label for calculating the Dice score

bigmb commented 5 years ago

t_data = '' # Input data -: Input data folder(3 Channel Images) l_data = '' #Input Label -: Label for the Input data folder(1 Channel Image) test_image = '' #Image to be predicted while training -: (3Channel Image) - just a single image, on which the model will show the perfromance test_label = '' #Label of the prediction Image -: (1 Channel Label) - the label for the input single image test_folderP = '' #Test folder Image -: Folder for which the prediction has to be done. (3 Channel) test_folderL = '' #Test folder Label for calculating the Dice score -: Labels for the prediction if want to check the dice score(1 Channel)

Also in transforms grayscale is used so it will make it 1 channel if the label is 3 channel.

vcvishal commented 5 years ago

would you please help me to solve this error

Traceback (most recent call last): File "pytorch_run.py", line 249, in input_images(x, y, i, n_iter, k) File "C:\Users\vcvis\Desktop\Unet-Segmentation-Pytorch-Nest-of-Unets-master\ploting.py", line 85, in input_images x3 = x2[1, 1, :, :] IndexError: index 1 is out of bounds for axis 0 with size 1

why it's happening where I am wrong thank you

bigmb commented 5 years ago

Make it 0 and check. And try to print the shape of x2 also.

vcvishal commented 5 years ago

thank for response this is my x2

X2>>>> tensor([[[[-0.7725, -0.7725, -0.7725, ..., 0.8353, 0.8353, -0.7725], [ 0.6863, 0.6863, 0.6863, ..., 0.9137, 0.9137, -0.7725], [ 0.7255, 0.7255, 0.7255, ..., 0.8980, 0.9137, -0.7725], ..., [-0.7725, 0.8039, 0.8118, ..., 0.8588, 0.8118, 0.7412], [-0.7725, 0.8039, 0.8118, ..., 0.5686, 0.4353, 0.3098], [-0.7725, 0.8039, 0.8118, ..., -0.7725, -0.7725, -0.7725]],

     [[-0.7725, -0.7725, -0.7725,  ..., -0.2157, -0.2314, -0.7725],
      [ 0.7255,  0.7255,  0.7255,  ..., -0.2235, -0.2549, -0.7725],
      [ 0.7569,  0.7569,  0.7569,  ..., -0.2863, -0.3020, -0.7725],
      ...,
      [-0.7725,  0.8039,  0.8118,  ..., -0.1451, -0.2000, -0.2706],
      [-0.7725,  0.8039,  0.8118,  ..., -0.3333, -0.4510, -0.5765],
      [-0.7725,  0.8039,  0.8118,  ..., -0.7725, -0.7725, -0.7725]],

     [[-0.7725, -0.7725, -0.7725,  ..., -0.1686, -0.1529, -0.7725],
      [ 0.7412,  0.7255,  0.7412,  ..., -0.1373, -0.1216, -0.7725],
      [ 0.7725,  0.7725,  0.7725,  ..., -0.1529, -0.1451, -0.7725],
      ...,
      [-0.7725,  0.8039,  0.8118,  ..., -0.0980, -0.1529, -0.2235],
      [-0.7725,  0.8039,  0.8118,  ..., -0.2941, -0.4196, -0.5451],
      [-0.7725,  0.8039,  0.8118,  ..., -0.7725, -0.7725, -0.7725]]]])

and have changed the code and it working i am not sure is it correct or not

def input_images(x, y, i, n_iter, k=1): """

:param x: takes input image
:param y: take input label
:param i: the epoch number
:param n_iter:
:param k: for keeping it in loop
:return: Returns a image and label
"""
if k == 1:
    x1 = x
    y1 = y

    x2 = x1.to('cpu')
    print("X2>>>>",x2)
    y2 = y1.to('cpu')
    x2 = x2.detach().numpy()
    y2 = y2.detach().numpy()

    x3 = x2[0, 1, :, :]
    y3 = y2[0, 0, :, :]

    fig = plt.figure()

    ax1 = fig.add_subplot(1, 2, 1)
    ax1.imshow(x3)
    ax1.axis('off')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    ax1 = fig.add_subplot(1, 2, 2)
    ax1.imshow(y3)
    ax1.axis('off')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    plt.savefig(
        './model/pred/L_' + str(n_iter-1) + '_epoch_'
        + str(i))

guide me

bigmb commented 5 years ago

Try with x2[1,0,:,:] , x3[1,0:,:] and before these line print(x2.shape)

vcvishal commented 5 years ago

i got this

X2>>>> (1, 3, 96, 96) Traceback (most recent call last): File "pytorch_run.py", line 249, in input_images(x, y, i, n_iter, k) File "C:\Users\vcvis\Desktop\Unet-Segmentation-Pytorch-Nest-of-Unets-master\ploting.py", line 86, in input_images x3 = x2[1, 0, :, :] IndexError: index 1 is out of bounds for axis 0 with size 1

vcvishal commented 5 years ago

images are in jpg format and masks are in gif format they are creating a problem?

bigmb commented 5 years ago

Now can let me know for y3? It stopped giving the error for x3 right? Y3 shape is of 1 channel , [1,0,:,:] , it should work

bigmb commented 5 years ago

No mask of gif shouldn't be a problem. Let me try with the dataset once

vcvishal commented 5 years ago

thank you for you response

y3 also give error, shape of y2 is showed below

X2>>>> (1, 3, 96, 96) y2>>>>>>>>> (1, 1, 96, 96) Traceback (most recent call last): File "pytorch_run.py", line 249, in input_images(x, y, i, n_iter, k) File "C:\Users\vcvis\Desktop\Unet-Segmentation-Pytorch-Nest-of-Unets-master\ploting.py", line 87, in input_images y3 = y2[1, 0, :, :] IndexError: index 1 is out of bounds for axis 0 with size 1

i use this code

def input_images(x, y, i, n_iter, k=1): """

:param x: takes input image
:param y: take input label
:param i: the epoch number
:param n_iter:
:param k: for keeping it in loop
:return: Returns a image and label
"""
if k == 1:
    x1 = x
    y1 = y

    x2 = x1.to('cpu')

    y2 = y1.to('cpu')
    x2 = x2.detach().numpy()
    y2 = y2.detach().numpy()
    print("X2>>>>",x2.shape)
    print("y2>>>>>>>>>",y2.shape)
    y3 = y2[1, 0, :, :]
    print("Y3>>>>",y3.shape)
    x3 = x2[1, 0, :, :]
    y3 = y2[1, 0, :, :]

    fig = plt.figure()

    ax1 = fig.add_subplot(1, 2, 1)
    ax1.imshow(x3)
    ax1.axis('off')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    ax1 = fig.add_subplot(1, 2, 2)
    ax1.imshow(y3)
    ax1.axis('off')
    ax1.set_xticklabels([])
    ax1.set_yticklabels([])
    plt.savefig(
        './model/pred/L_' + str(n_iter-1) + '_epoch_'
        + str(i))

bigmb commented 5 years ago

Are you using batch size of one?? Try to increase it to 2 if you are using one. Also comment this line and check if the rest of the program is running.

bigmb commented 5 years ago

vcvishal commented 5 years ago

it's running but while this error occurred, I am using batch size 15

Traceback (most recent call last): File "pytorch_run.py", line 297, in s_label = data_transform(imlabel) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\transforms\transforms.py", line 61, in call img = t(img) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\transforms\transforms.py", line 164, in call return F.normalize(tensor, self.mean, self.std, self.inplace) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torchvision\transforms\functional.py", line 208, in normalize tensor.sub(mean[:, None, None]).div_(std[:, None, None]) RuntimeError: output with shape [1, 1280, 1918] doesn't match the broadcast shape [3, 1280, 1918]

bigmb commented 5 years ago

Can you point me to the dataset? I will try on my computer and let you know. Because you are facing a lot of errors. And try to use the batch size in multiple of 2.

You need to check which type of shape you are providing to each place where data transformation is there.

vcvishal commented 5 years ago

thank you for response please check this dataset to your own system, your code is very useful

https://www.kaggle.com/c/carvana-image-masking-challenge

vcvishal commented 5 years ago

this is my folder structure

t_data = './train/' l_data = './train_masks/' test_image = './test/image/0cdf5b5d0ce1_01_mask.jpg' test_label = './test/mask/0cdf5b5d0ce1_01_mask.gif' test_folderP = './train/' test_folderL = './train_masks/'

I am using the same folder for testing.

vcvishal commented 5 years ago

i changed the format of mask gif to jpg now its working but

Traceback (most recent call last): File "pytorch_run.py", line 308, in pred_tb = model_test(s_tb.unsqueeze(0).to(device)).cpu() File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(*input, kwargs) File "C:\Users\vcvis\Desktop\Unet-Segmentation-Pytorch-Nest-of-Unets-master\Models.py", line 87, in forward e1 = self.Conv1(x) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(*input, *kwargs) File "C:\Users\vcvis\Desktop\Unet-Segmentation-Pytorch-Nest-of-Unets-master\Models.py", line 25, in forward x = self.conv(x) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(input, kwargs) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\container.py", line 92, in forward input = module(input) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(*input, **kwargs) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\modules\batchnorm.py", line 83, in forward exponential_average_factor, self.eps) File "C:\Users\vcvis\AppData\Local\Programs\Python\Python36\lib\site-packages\torch\nn\functional.py", line 1697, in batch_norm training, momentum, eps, torch.backends.cudnn.enabled RuntimeError: CUDA out of memory. Tried to allocate 600.00 MiB (GPU 0; 4.00 GiB total capacity; 2.37 GiB already allocated; 500.80 MiB free; 67.80 MiB cached)

bigmb commented 5 years ago

This error is due to memory issue. Reduce the batch size to 4.

vcvishal commented 5 years ago

i am using only using batch size of 2

bigmb commented 5 years ago

Well, with the image of this size, it is difficult for the net to train with even batch size of 2. Did you try reducing the size of the image to maybe 256x256? With these size of the image and your ram, it is not possible to train at full size.

husheng876 commented 4 years ago

Well, with the image of this size, it is difficult for the net to train with even batch size of 2. Did you try reducing the size of the image to maybe 256x256? With these size of the image and your ram, it is not possible to train at full size.

i have the same problem and I try to change the batchsize to 1 and the image size to 56*56,but there still have problem that out of memory.And I use the GPU with 2080TI that has 12Gb memory.

bigmb commented 4 years ago

If you have such a system I dont think it would run out of memory. Did you check if it's running on GpU?

Also can you check if it's running well on basic Unet?

husheng876 commented 4 years ago

If you have such a system I dont think it would run out of memory. Did you check if it's running on GpU?

Also can you check if it's running well on basic Unet?

yes I have check the system is running on the GPU.But I still find there are some codes in your system send the model to the cpu.And the model I choose to run is the basic Unet.

bigmb commented 4 years ago

After the computation, it will send the data back to cpu. Can you tell me where you are facing the issue, then I can look into it in detail. It's been while, so I might need sometime.

husheng876 commented 4 years ago

After the computation, it will send the data back to cpu. Can you tell me where you are facing the issue, then I can look into it in detail. It's been while, so I might need sometime.

yes,First, i use the 3 channel input data and 1 channel mask data.Then i also change the code in input_images function define. And changing the data_tranform funciotn in the part of "saving the predictions" in pytorch_cun.py file follow your solution in other issues. Then i ran the system it will cause error like below.

bigmb commented 4 years ago

Maybe its due to dim. Can you try with 64x64 or 12x128. What happens is when you go to depth of model which is d5 , if it's not a multiple of 2 it will give an error like this.

It's a problem during running the model but not before that. Size of the tensors do not match. If you still have this issue after resizing the images to 64x64 or 128x128. Let me know

husheng876 commented 4 years ago

Maybe its due to dim. Can you try with 64x64 or 12x128. What happens is when you go to depth of model which is d5 , if it's not a multiple of 2 it will give an error like this.

It's a problem during running the model but not before that. Size of the tensors do not match. If you still have this issue after resizing the images to 64x64 or 128x128. Let me know

yes,I have changed all the image_transform function (image,mask) in pytorch_run.py and Data_loader.py with this code.But that didn't work.And the error like above is still appear.And I don't understand the sentence "What happens is when you go to depth of model which is d5 , if it's not a multiple of 2 it will give an error like this."Could you provide me with more details?Thanks a lot.

husheng876 commented 4 years ago

Maybe its due to dim. Can you try with 64x64 or 12x128. What happens is when you go to depth of model which is d5 , if it's not a multiple of 2 it will give an error like this. It's a problem during running the model but not before that. Size of the tensors do not match. If you still have this issue after resizing the images to 64x64 or 128x128. Let me know

yes,I have changed all the image_transform function (image,mask) in pytorch_run.py and Data_loader.py with this code.But that didn't work.And the error like above is still appear.And I don't understand the sentence "What happens is when you go to depth of model which is d5 , if it's not a multiple of 2 it will give an error like this."Could you provide me with more details?Thanks a lot.

sorry,I change the code again.And I marked out the code"torchvision.transforms.CenterCrop(56)," in all the transform function and didn't omit others.Then I ran the system it cause a new problem

bigmb commented 4 years ago

So the data which is loaded in the pred_tb.detach().Numpy() is still in GPU memory. And it needs to called back to the CPU for executing that command. That's why it requrie data.cpu() after the computation is carried out in GPU

husheng876 commented 4 years ago

So the data which is loaded in the pred_tb.detach().Numpy() is still in GPU memory. And it needs to called back to the CPU for executing that command. That's why it requrie data.cpu() after the computation is carried out in GPU

yes,I got it.And I add the code "pred_tb= pred_tb.cpu().numpy()"in pytorch_run.py file after 305 line.And the system work.Thank you for your help.

bigmb / Unet-Segmentation-Pytorch-Nest-of-Unets

error while training #7