MIC-DKFZ / basic_unet_example

An example project of how to use a U-Net for segmentation on medical images with PyTorch.
Apache License 2.0
140 stars 38 forks source link

Issue with large size images #1

Closed akhanss closed 5 years ago

akhanss commented 5 years ago

@MIC-DKFZ Thank you for a great repo. The example works with 'hippocampus' image dataset, I am just wondering if you can provide an example with a moderately larger image, say BraTS 2018 or images like 256x256x170 or so on. Thank you in advance.

Cheers, Azam.

elpequeno commented 5 years ago

Hi Azam, thank you, we are happy that you find our code useful. Could you please provide some information on what the exact issue was? Did you try out larger image sizes and got errors? If so, could you please provide the exact error message? Are you referring to the 2D or 3D U-Net?

Saludos, André

akhanss commented 5 years ago

Hi André, Thank you for kind and prompt reply. We are trying 2D UNet first. The images are of different sizes, from 224 x 224 x 170 to 256 x 256 x 190 pixels, where 3rd dimension is #slice. When we run 'python run_train_pipeline.py', it gives me following error: "ValueError: could not broadcast input array from shape (224,224,170) into shape (64,64,53)" which is basically generated from 'datasets/utils.py'- reshaped_image[x_offset:orig_img.shape[0]+x_offset, y_offset:orig_img.shape[1]+y_offset, z_offset:orig_img.shape[2]+z_offset] = orig_img Then, after some trial and error, we changed new_shape=(64, 64, 64) to new_shape=(256, 256, 256) at 'datasets/example_dataset/preprocessing.py'. I have also changed the required shape (#slices, w,h) at 'datasets/utils.py' by adding following 2 lines in the beginning:

Convert (w, h, #slice) to (#slice, w, h)

orig_img = orig_img.swapaxes(1, 2) # 224, 170, 224
orig_img = orig_img.swapaxes(0, 1) # 170, 224, 224

For batch_size=1 in 'configs/Config_unet.py', we encounter an error- File "../loss_functions/dice_loss.py", line 86, in forward yonehot.scatter(1, y, 1) RuntimeError: invalid argument 3: Index tensor must have same dimensions as output tensor at /pytorch/aten/src/THC/generic/THCTensorScatterGather.cu:295 Experiment exited. Checkpoints stored =)

However, when we change batch_size=2 at least, the error diminishes, however, there arises a new error: Experiment set up. Experiment started. =====TRAIN===== Reshuffle... Initializing... this might take a while... Epoch: 0 Loss: 0.9434 File "/home/a_khanss/dl/basic_unet_example/src/trixi/trixi/experiment/experiment.py", line 79, in run self.train(epoch=epoch) File "/home/a_khanss/dl/med-img/basic_unet_example_brain_tumor/experiments/UNetExperiment.py", line 110, in train loss = self.dice_loss(pred_softmax, target.squeeze()) + self.ce_loss(pred, target.squeeze()) File "/home/a_khanss/anaconda3/envs/unet_example/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/a_khanss/dl/med-img/basic_unet_example_brain_tumor/loss_functions/dice_loss.py", line 97, in forward rebalance_weights=self.rebalance_weights) File "/home/a_khanss/dl/med-img/basic_unet_example_brain_tumor/loss_functions/dice_loss.py", line 123, in soft_dice_per_batch_2 weights = weights.cuda(net_output.device.index) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generic/THCTensorCopy.cpp:21 ..... Experiment exited. Checkpoints stored =) After struggling a few days, we can't proceed after this point. Perhaps, we are also unable to start the training using multiple GPUs.

We noticed your success in BraTS 2017 and recent success using 'No New-Net' in BraTS 2018 as well. So, we are trying to explore UNet (2D and 3D) for our task as well. We appreciate your kind help for running properly using this example.

Thank you once again.

Cheers,

elpequeno commented 5 years ago

Hi Azam,

thank you for the details. I will take a look and come back to you as soon as I have any news.

Saludos, André

FabianIsensee commented 5 years ago

Hi Azam, have you verified that your pytorch installation is OK? The error you are getting arises when pytorch attempts to copy weights to the GPU. This may not be related to our code. Try the following minimalistic example:

a = torch.rand((32, 3, 32, 32, 32))
b = nn.Conv3D(3, 16, 3)
a.cuda()
b.cuda()
c = b(a)

Also please note that this repository is unrelated to our BraTS submissions. This does not mean it cannot get great results, but you will most likely not be able to reproduce our results with it as it is.

Best, Fabian

akhanss commented 5 years ago

Hi Fabian, Thanks for your quick reply. Okay I'll check that. BTW, I think I have to mention this warning as well: In file 'experiments/UNetExperiment.py', for line- pred_softmax = F.softmax(pred) UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument. Would you please check at your convenience?

elpequeno commented 5 years ago

Hi Azam,

Yes, I am aware of this warning. I think I even fixed it on my local machine. I'll check and push it.

Saludos, André

akhanss commented 5 years ago

Okay, it is working for 2D UNet, however loss is negative for most of the time! Thanks for your kind and valuable helps @elpequeno @FabianIsensee .

Also, when I am trying to run train3D.py, it is complaining- Experiment set up. Experiment started. =====TRAIN===== Reshuffle... Initializing... this might take a while... Epoch: 0 Loss: 1.2428 VALIDATE Epoch: 0 Loss: nan File "../experiments/UNetExperiment3D.py", line 152, in validate self.clog.show_image_grid(data[:,:,30].float(), name="data_val", normalize=True, scale_each=True, n_iter=epoch) UnboundLocalError: local variable 'data' referenced before assignment Experiment exited. Checkpoints stored =)

a = torch.rand((32, 3, 32, 32, 32)) b = nn.Conv3D(3, 16, 3) a.cuda() b.cuda() c = b(a) Running above code snippet give me the following error for c = b(a) Traceback (most recent call last): File "", line 1, in File "/home/a_khanss/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/a_khanss/anaconda3/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 421, in forward self.padding, self.dilation, self.groups) RuntimeError: Expected object of type torch.FloatTensor but found type torch.cuda.FloatTensor for argument #2 'weight'

FabianIsensee commented 5 years ago

The loss used here is the sum between Dice loss and crossentropy. The dice loss will get negative, that is normal. Typically people define the Dice loss as 1 - soft_dice to avoid this, but this is completely unnecessary because the 1 irrelevant when computing the gradient ;-)

About that other problem - @elpequeno needs to deal with this, I am not familiar with this part of the code.

elpequeno commented 5 years ago

I am happy the 2D Network works for you. The 3D version is still in development, so I cannot promise that everything works perfectly right now. The error you are getting shows, that your "data" variable is empty. That means you did not load your batch properly. It is probably an error in the dataloader or in your preprocessing. As it is not related to "large size images", I will close this task for now. Feel free to comment and share your experience.

FabianIsensee commented 5 years ago

More specifically, your val_data_loader must have a problem. for data_batch in self.val_data_loader: (line 136 in UNetExperiment3D.py) If iterating over it would have worked (which it didnt) then data would have been assigned. @elpequeno this should be captured by the code! Set data=None and then assert data is not None after the loop. @akhanss what is that error message you posted? This looks unrelated to what you wrote in your message. It looks like someone (either us or you, I don't know this code) forgot to .cuda() either the model or the data (after taking the data from the dataloader)

elpequeno commented 5 years ago

@FabianIsensee: That is right, I will add that to the code. The posted error-message seems to be related to the example snippet you suggested above.

elpequeno commented 5 years ago

@akhanss: Did you check whether your validation keys are empty in your splits.pkl?

FabianIsensee commented 5 years ago
a = torch.rand((32, 3, 32, 32, 32))
b = nn.Conv3d(3, 16, 3)
a = a.cuda()
b = b.cuda()
c = b(a)

My bad I hacked this together without checking. This one works, but that is off topic now as @akhanss pytorch seems to work :-)

akhanss commented 5 years ago
a = torch.rand((32, 3, 32, 32, 32))
b = nn.Conv3d(3, 16, 3)
a = a.cuda()
b = b.cuda()
c = b(a)

My bad I hacked this together without checking. This one works, but that is off topic now as @akhanss pytorch seems to work :-)

Actually, I corrected typo, nn.Conv3D -> nn.Conv3d. Sorry not to mention that earlier! Also, I'll check your other suggestions.

BTW, would you please give an idea about the relation between patch_size (in Config_unet.py) and new_shape (in example_dataset/preprocessing.py)

Thanks once again for your great responses @elpequeno @FabianIsensee

elpequeno commented 5 years ago

We provide a very basic preprocessing example. In example_dataset/preprocessing.py we resize all data samples to the same shape. The preprocessing always depends on your task. Therefore I cannot give specific advice on how to handle your data. Config.patch_size is passed to the dataloader and used in data_augmentation.py. In our example, we use whole image slices as input images, so the patch_size and new_shape are both 64. You could also imagine a scenario where you would e.g. crop 64x64 patches from a 512x512 image. In that case, patch_size would be 64 and the shape of your images would be 512. I hope that gives you an idea about these 2 variables.