Project-MONAI / tutorials

MONAI Tutorials
https://monai.io/started.html
Apache License 2.0
1.85k stars 681 forks source link

Error adapting Spleen example to different shaped dataset #48

Closed catskillsresearch closed 3 years ago

catskillsresearch commented 4 years ago

Describe the bug I am trying to adapt the spleen_segmentation_3d.ipynb notebook to imaging data with a slightly different shape. The images in the Spleen set are 226x257 with 113 planes in the stack. My images are 1200x340 with 20 planes in the stack. The notebook samples the data in cubes of size 96x96x96. To get the example notebook to work, I have to duplicate my data on the planes to be 20+20+20+20+16 = 96. Otherwise it breaks, for the obvious reason that you can't get 96 slices out of 20.

Suppose however that I change the cube size to 20x20x20, so I don't duplicate planes to match the exact setup of the notebook. I still get a problem. Here is the problem, please let me know how to resolve it:

RuntimeError                              Traceback (most recent call last)
<ipython-input-15-26b65d7e4120> in <module>
     19         )
     20         optimizer.zero_grad()
---> 21         outputs = model(inputs)
     22         loss = loss_function(outputs, labels)
     23         loss.backward()

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/mona/lib/python3.8/site-packages/monai/networks/nets/unet.py in forward(self, x)
    190 
    191     def forward(self, x: torch.Tensor) -> torch.Tensor:
--> 192         x = self.model(x)
    193         return x
    194 

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
     98     def forward(self, input):
     99         for module in self:
--> 100             input = module(input)
    101         return input
    102 

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/mona/lib/python3.8/site-packages/monai/networks/layers/simplelayers.py in forward(self, x)
     37 
     38     def forward(self, x: torch.Tensor) -> torch.Tensor:
---> 39         return torch.cat([x, self.submodule(x)], self.cat_dim)
     40 
     41 

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
     98     def forward(self, input):
     99         for module in self:
--> 100             input = module(input)
    101         return input
    102 

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/mona/lib/python3.8/site-packages/monai/networks/layers/simplelayers.py in forward(self, x)
     37 
     38     def forward(self, x: torch.Tensor) -> torch.Tensor:
---> 39         return torch.cat([x, self.submodule(x)], self.cat_dim)
     40 
     41 

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
     98     def forward(self, input):
     99         for module in self:
--> 100             input = module(input)
    101         return input
    102 

~/anaconda3/envs/mona/lib/python3.8/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/mona/lib/python3.8/site-packages/monai/networks/layers/simplelayers.py in forward(self, x)
     37 
     38     def forward(self, x: torch.Tensor) -> torch.Tensor:
---> 39         return torch.cat([x, self.submodule(x)], self.cat_dim)
     40 
     41 

RuntimeError: Sizes of tensors must match except in dimension 2. Got 4 and 3

To Reproduce Here is the code:

import glob, os, torch
from monai.data import CacheDataset, DataLoader, Dataset
from monai.inferers import sliding_window_inference
from monai.losses import DiceLoss
from monai.metrics import compute_meandice
from monai.networks.layers import Norm
from monai.networks.nets import UNet
from monai.utils import first, set_determinism
data_dir='nf1_monai'
os.environ['MONAI_DATA_DIRECTORY']=data_dir
directory = os.environ.get("MONAI_DATA_DIRECTORY")
root_dir = directory
train_images = sorted(glob.glob(os.path.join(data_dir, "imagesTr", "*.npy")))
train_labels = sorted(glob.glob(os.path.join(data_dir, "labelsTr", "*.npy")))
data_dicts = [
    {"image": image_name, "label": label_name}
    for image_name, label_name in zip(train_images, train_labels)
]
train_files, val_files = data_dicts[:-10], data_dicts[-10:]
set_determinism(seed=0)
from monai.transforms import (
    AddChanneld,
    Compose,
    LoadNumpyd,
    RandCropByPosNegLabeld,
    ToTensord,
)
train_transforms = Compose(
    [
        LoadNumpyd(keys=["image", "label"]),
        AddChanneld(keys=["image", "label"]),
        RandCropByPosNegLabeld(
            keys=["image", "label"],
            label_key="label",
            spatial_size=(20,20,20),
            pos=1,
            neg=1,
            num_samples=4,
            image_key="image",
            image_threshold=0,
        ),
        ToTensord(keys=["image", "label"]),
    ]
)
val_transforms = Compose(
    [
        LoadNumpyd(keys=["image", "label"]),
        AddChanneld(keys=["image", "label"]),
        ToTensord(keys=["image", "label"]),
    ]
)
device = torch.device("cuda:0")
model = UNet(
    dimensions=3,
    in_channels=1,
    out_channels=2,
    channels=(16, 32, 64, 128, 256),
    strides=(2, 2, 2, 2),
    num_res_units=2,
    norm=Norm.BATCH,
).to(device)
loss_function = DiceLoss(to_onehot_y=True, softmax=True)
optimizer = torch.optim.Adam(model.parameters(), 1e-4)
train_ds = CacheDataset(data=train_files, transform=train_transforms, cache_rate=1.0, num_workers=1)
train_loader = DataLoader(train_ds, batch_size=6, shuffle=True, num_workers=16)
val_ds = CacheDataset(data=val_files, transform=val_transforms, cache_rate=1.0, num_workers=16)
val_loader = DataLoader(val_ds, batch_size=1, num_workers=1)
epoch_num = 600
val_interval = 2
best_metric = -1
best_metric_epoch = -1
epoch_loss_values = list()
metric_values = list()
for epoch in range(epoch_num):
    print("-" * 10)
    print(f"epoch {epoch + 1}/{epoch_num}")
    model.train()
    epoch_loss = 0
    step = 0
    for batch_data in train_loader:
        step += 1
        inputs, labels = (
            batch_data["image"].to(device),
            batch_data["label"].to(device),
        )
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = loss_function(outputs, labels)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
        print(f"{step}/{len(train_ds) // train_loader.batch_size}, train_loss: {loss.item():.4f}")
    epoch_loss /= step
    epoch_loss_values.append(epoch_loss)
    print(f"epoch {epoch + 1} average loss: {epoch_loss:.4f}")

Expected behavior The UNet should train and not break.

Environment (please complete the following information): OS: Ubuntu 20.04LTS MONAI version: 0.3.0 Python version: 3.8.2 (default, Mar 26 2020, 15:53:00) [GCC 7.3.0] OS version: Linux (5.4.0-52-generic) Numpy version: 1.18.1 Pytorch version: 1.5.0 MONAI flags: HAS_EXT = False, USE_COMPILED = False

Optional dependencies: Pytorch Ignite version: 0.3.0 Nibabel version: 3.1.0 scikit-image version: 0.16.2 Pillow version: 7.1.2 Tensorboard version: 2.2.1 gdown version: 3.12.2 TorchVision version: 0.6.0a0+82fd1c8 ITK version: 5.1.1 tqdm version: 4.50.2

Additional context I am trying to do tumor detection on whole-body MRI scans. The tumors are small and the body is large. So far this is giving me an average F1 score of 0.17 using this library, training with the 20+20+20+20+16 stacking workaround.

ericspod commented 4 years ago

I suspect your network configuration isn't compatible with patches that are that small. You're creating your UNet with

model = UNet(
    dimensions=3,
    in_channels=1,
    out_channels=2,
    channels=(16, 32, 64, 128, 256),
    strides=(2, 2, 2, 2),
    num_res_units=2,
    norm=Norm.BATCH,
).to(device)

With this configuration your network will have 5 layers and with strides of 2 for each downsampling you will halve the spatial dimensions of the input image 4 times. This means your image sizes in the network are 20**3 -> 10**3 -> 5**3 -> 2**3 -> 1**3. Attempting to upsample this in the decode path of the network will not work because of the default padding in the upsample convolutions.

I would suggest using patch sizes that are multiples of powers of 2 which, for a dimension of M*2**N allows you to downsample N times. In your case what you can do is use a patch size of 32 and stack your volumes only once so the depth dimension is 40, or double each slice in your volume to get the same.

Alternatively you can stick with 20**3 as your patch size and use (64,128,256) as your channels argument and (2,2) as your strides argument to make a shallower network and see how that works.

catskillsresearch commented 4 years ago

Thanks very much Eric. There are 20 planes in the MRI. Should I be trying to "thicken" the plane volumes so that they correspond more to the physical dimensions of the X-Y plane, or is that irrelevant? If so, how do I do that? My MRI images are 20 NumPy bitmaps. I think something like this is done in the Spleen tutorial notebook here:

Spacingd(keys=["image", "label"], pixdim=(1.5, 1.5, 2.0), mode=("bilinear", "nearest")),

The Spleen example from NVidia uses a Nifti format that comes with metadata including an Affine transform that the Spacingd transform accesses. What would be the corresponding arguments for Spacing transform?

Also, one last question. The tumors (in this case, from Neurofibromatosis 1) are very sparse in the whole MRI volume. So 98% of the volume is non-tumor and 2% is tumor. Do you have any thoughts on how to increase accuracy for this very sparse class? I have only 50 full labelled examples to work with. Should I explore Waterloo's "Less than one shot" learning technique?

ericspod commented 4 years ago

For Spacing you provide a multiple which reduces the size so if you want to go from 1200x340x20 to 1200x340x40 you want something like this:

from monai.transforms import Spacing
s=Spacing(pixdim=(1, 1, 0.49))
t=torch.rand(1,1200,340,20)
print(s(t)[0].shape)  # (1, 1200, 340, 40)

For using Spacingd you would have the same pixdim values as this example.

I'm honestly not sure what to comment on your particular problem, I shall ask my colleagues to see if anyone has something specific to this particular segmentation problem to contribute.

catskillsresearch commented 4 years ago

Thanks Eric! Intuitively I don't think it should matter. The spleen tutorial has the pixdim=(1.5, 1.5, 2.0). I'm curious what motivated that. It seems like they were doing it to reduce the size of the inputs:

from monai.transforms import Spacing
import torch
s=Spacing(pixdim=(1.5, 1.5, 2.0))
t=torch.rand(1,226,257,113)
print(s(t)[0].shape)  # (1, 151, 172, 57)

Regarding the overall number of samples (50 in my case), and sparsity in the image set of positive labels (tiny tumors), any thoughts will be very welcome.

This is in the context of the Children's Tumor Foundation Hack for NF which is still open for participation.

Irme commented 4 years ago

Hello! I work with sparse data (cerebral microbleed segmentation), and I would recommend you look into the following:

Generally with very imbalanced data it takes a lot of patience to fine-tune the network.

catskillsresearch commented 4 years ago

@Irme thanks. Please consider joining the CTF NF Hackathon which is still open and has plenty of medical data. Here are some details:

https://nfhack-platform.bemyapp.com/#/event We have more than 500 participants registered from around the world as well as more than 40 mentors and 21 projects in progress. Registration remains open throughout the Hackathon and projects can be submitted up until November 13th.