eisen-ai / eisen-core

Core functionality of Eisen
MIT License
41 stars 10 forks source link

How to setup for non-binary segmentation (background, organ and tumor) #7

Closed JoaoSantinha closed 4 years ago

JoaoSantinha commented 4 years ago

Hi @eisen-ai ,

Thanks for the amazing package!

I am experimenting with Eisen to perform segmentations on both organ and tumor and I tried to get example in the documentation on how to work with a segmentation that as two labels (0s - background; 1s - organ; 2s - tumor), but didn't find anything.

I am following the tutorial 'Eisen_MSD_Demo.ipynb' but I don't know what I shall put in the cell that contains the following code:

resample_tform = ResampleNiftiVolumes(['image', 'label'], [1.0, 1.0, 1.0], 'linear')

to_numpy_tform = NiftiToNumpy(['image', 'label'])

crop = CropCenteredSubVolumes(fields=['image', 'label'], size=[64, 64, 64])

add_channel = AddChannelDimension(['image', 'label'])

map_intensities = MapValues(['image'], min_value=0.0, max_value=1.0)

threshold_labels = ThresholdValues(['label'], threshold=0.5)

should I transform threshold_labels = ThresholdValues(['label'], threshold=0.5) into threshold_organ_labels = ThresholdValues(['label'], threshold=0.5)

threshold_tumor_labels = ThresholdValues(['label'], threshold=1.5) ?

Thanks in advance?

eisen-ai commented 4 years ago

Thanks for using the package!

in case of multi region problems, the suggestion is to use a different transform chain. One should translate the label, which is something having integer values from 0 (background) to N (number of regions), into a one-hot-labelmap.

to do so, you can use this transform

eisen.transforms.imaging.LabelMapToOneHot

You find the documentation of this transform here: http://docs.eisen.ai/eisen/api.html#eisen.transforms.imaging.LabelMapToOneHot

how you use it, in practice, is to find out what are the labels and get them into a one-hot format. If you have labels 0 = background 1=organ 2=tumor you do

# REMOVE threshold_labels = ThresholdValues(['label'], threshold=0.5) and substitute with
from eisen.transforms import LabelMapToOneHot
tform = LabelMapToOneHot(['label'], [0, 1, 2])

this of course if you want to include the background as a class. In this way your problem will become three classes. The output of the network will also be multichannel. You will need to tell your network that you want 3 output channels. The first one (0-th) will have predictions for background, the second one (1-st) will have predictions for organ and the third one (2-nd) will have prediction for the tumor.

I hope this explains how you can solve this issue.

JoaoSantinha commented 4 years ago

Thank you so much for making this clear!!

JoaoSantinha commented 4 years ago

Hi @eisen-ai

I still have some questions and I am getting an error which is more than likely caused by the some lack of understanding.

I was trying to use the MSD pancreas dataset (Task07) which has the both the pancreas and a pancreatic tumor segmentation but I am getting the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/eisen/utils/workflows/training.py in run(self)
    113         with self.epoch_aggregator as ea:
--> 114             for i, batch in enumerate(self.data_loader):
    115                 if self.gpu:

6 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    855                 del self._task_info[idx]
--> 856                 return self._process_data(data)
    857 

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _process_data(self, data)
    880         if isinstance(data, ExceptionWrapper):
--> 881             data.reraise()
    882         return data

/usr/local/lib/python3.6/dist-packages/torch/_utils.py in reraise(self)
    393             msg = KeyErrorMessage(msg)
--> 394         raise self.exc_type(msg)

AttributeError: Caught AttributeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/eisen/utils/__init__.py", line 120, in __getitem__
    item = self.transform(item)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py", line 70, in __call__
    img = t(img)
  File "/usr/local/lib/python3.6/dist-packages/eisen/transforms/imaging.py", line 224, in __call__
    original_spacing = data[field].header.get_zooms()
AttributeError: 'str' object has no attribute 'header'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-25-549c9012c245> in <module>()
      1 for i in range(NUM_EPOCHS):
----> 2     training.run()

/usr/local/lib/python3.6/dist-packages/eisen/utils/workflows/training.py in run(self)
    128                 )
    129 
--> 130                 ea(output_dictionary)
    131 
    132         dispatcher.send(

/usr/local/lib/python3.6/dist-packages/eisen/utils/workflows/workflows.py in __exit__(self, *args, **kwargs)
     95     def __exit__(self, *args, **kwargs):
     96         for typ in ['losses', 'metrics']:
---> 97             for i in range(len(self.epoch_data[typ])):
     98                 for key in self.epoch_data[typ][i].keys():
     99                     self.epoch_data[typ][i][key] = np.asarray(self.epoch_data[typ][i][key])

KeyError: 'losses'

my code is the following after the imports:

resample_tform = ResampleNiftiVolumes(['image', 'label'], [1.0, 1.0, 2.5], 'linear')
map_intensities = FixedMeanStdNormalization(['image'], mean=50.0, std=100.0)
crop = CropCenteredSubVolumes(fields=['image', 'label'], size=[200, 160, 64])
add_channel = AddChannelDimension(['image'])
label_to_onehot = LabelMapToOneHot(['label'], classes=[1, 2])

# create a transform to manipulate and load data
tform = Compose([
                 resample_tform, 
                 map_intensities,
                 crop,
                 add_channel, 
                 label_to_onehot
                 ])

# create a dataset from the training set of the MSD dataset
dataset = MSDDataset(
    PATH_DATA, 
    NAME_MSD_JSON, 
    'training', 
    transform=None
)

# define a splitter to do a 80%-20% split of the data
splitter = EisenDatasetSplitter(0.80, 0.20, 0.0, transform_train=tform, transform_valid=tform)

# define a training and test sets
dset_train, dset_test, _ = splitter(dataset)

# create data loader for training, this functionality is pure pytorch
data_loader_train = DataLoader(
    dset_train, 
    batch_size=BATCH_SIZE, 
    shuffle=True, 
    num_workers=4
)

# specify model and loss
model = EisenModuleWrapper(module=UNet(input_channels=1, output_channels=2), input_names=['image'], output_names=['predictions']) 

loss = EisenModuleWrapper(module=DiceLoss(dim=[1, 2]), input_names=['predictions', 'label'], output_names=['dice_loss'])

metric = EisenModuleWrapper(module=DiceMetric(dim=[1, 2]), input_names=['predictions', 'label'], output_names=['dice_metric'])

optimizer = Adam(model.parameters(), 0.001)

# join all blocks into a workflow (training workflow)
training = Training(
      model=model, 
      losses=[loss], 
      data_loader=data_loader_train,
      optimizer=optimizer,
      metrics=[metric], 
      gpu=True
    )

train_loggin_hook = LoggingHook(training.id, 'Training', PATH_ARTIFACTS)
train_summary_hook = TensorboardSummaryHook(training.id, 'Training', PATH_ARTIFACTS)

train_loggin_hook_print = LoggingHook(training.id, 'Training', None)

# run optimization for NUM_EPOCHS
for i in range(NUM_EPOCHS):
    training.run()

I tried loss = EisenModuleWrapper(module=DiceLoss, input_names=['predictions', 'label'], output_names=['dice_loss']), then saw Fausto Milletari's Medium post for segmentation of COVID-19 CTs which has 3 labels and tried loss = EisenModuleWrapper(module=DiceLoss(dim=[1, 2]), input_names=['predictions', 'label'], output_names=['dice_loss']) and some other combinations but it didn't worked.

I looked at the documentation of DiceLoss I didn't understand the rational on how to define dim and I would like you help on that and also on what may be the issue causing this error.

Thanks in advance for your help

eisen-ai commented 4 years ago

Hello,

your transform chain is

resample_tform = ResampleNiftiVolumes(['image', 'label'], [1.0, 1.0, 2.5], 'linear')
map_intensities = FixedMeanStdNormalization(['image'], mean=50.0, std=100.0)
crop = CropCenteredSubVolumes(fields=['image', 'label'], size=[200, 160, 64])
add_channel = AddChannelDimension(['image'])
label_to_onehot = LabelMapToOneHot(['label'], classes=[1, 2])

# create a transform to manipulate and load data
tform = Compose([
                 resample_tform, 
                 map_intensities,
                 crop,
                 add_channel, 
                 label_to_onehot
                 ])

but you should also load the nifti volumes!

the transform chain should therefore be

from eisen.io import LoadNiftyFromFilename

read_tform = LoadNiftyFromFilename(['image', 'label'], PATH_DATA)

resample_tform = ResampleNiftiVolumes(['image', 'label'], [1.0, 1.0, 2.5], 'linear')
map_intensities = FixedMeanStdNormalization(['image'], mean=50.0, std=100.0)
crop = CropCenteredSubVolumes(fields=['image', 'label'], size=[200, 160, 64])
add_channel = AddChannelDimension(['image'])
label_to_onehot = LabelMapToOneHot(['label'], classes=[1, 2])

# create a transform to manipulate and load data
tform = Compose([
                 read_tform,
                 resample_tform, 
                 map_intensities,
                 crop,
                 add_channel, 
                 label_to_onehot
                 ])
JoaoSantinha commented 4 years ago

Thanks!

I added the LoadNiftyFromFilename as suggested but I am still getting a similar error (KeyError: 'losses' but seems to know be related with TypeError: unsupported operand type(s) for -: 'Nifti1Image' and 'float' - full error at the end). Here is the updated code:

# Defining some constants
PATH_DATA = './drive/My Drive/Task07_Pancreas'
PATH_ARTIFACTS = './results'

NAME_MSD_JSON = 'dataset.json'

NUM_EPOCHS = 100
BATCH_SIZE = 4

read_tform = LoadNiftyFromFilename(['image', 'label'], PATH_DATA)
resample_tform = ResampleNiftiVolumes(['image', 'label'], [1.0, 1.0, 2.5], 'linear')
map_intensities = FixedMeanStdNormalization(['image'], mean=50.0, std=100.0)
crop = CropCenteredSubVolumes(fields=['image', 'label'], size=[200, 160, 64])
add_channel = AddChannelDimension(['image'])
label_to_onehot = LabelMapToOneHot(['label'], classes=[1, 2])

# create a transform to manipulate and load data
tform = Compose([
                 read_tform,
                 resample_tform, 
                 map_intensities,
                 crop,
                 add_channel, 
                 label_to_onehot
                 ])

# create a dataset from the training set of the MSD dataset
dataset = MSDDataset(
    PATH_DATA, 
    NAME_MSD_JSON, 
    'training', 
    transform=None
)

# define a splitter to do a 80%-20% split of the data
splitter = EisenDatasetSplitter(0.80, 0.20, 0.0, transform_train=tform, 
                                transform_valid=tform)

# define a training and test sets
dset_train, dset_test, _ = splitter(dataset)

# create data loader for training, this functionality is pure pytorch
data_loader_train = DataLoader(
    dset_train, 
    batch_size=BATCH_SIZE, 
    shuffle=True, 
    num_workers=4
)

# specify model and loss
model = EisenModuleWrapper(module=UNet(input_channels=1, output_channels=2), input_names=['image'], output_names=['predictions']) 

loss = EisenModuleWrapper(module=DiceLoss(dim=[1, 2]), input_names=['predictions', 'label'], output_names=['dice_loss'])

metric = EisenModuleWrapper(module=DiceMetric(dim=[1, 2]), input_names=['predictions', 'label'], output_names=['dice_metric'])

optimizer = Adam(model.parameters(), 0.001)

# join all blocks into a workflow (training workflow)
training = Training(
      model=model, 
      losses=[loss], 
      data_loader=data_loader_train,
      optimizer=optimizer,
      metrics=[metric], 
      gpu=True
    )

train_loggin_hook = LoggingHook(training.id, 'Training', PATH_ARTIFACTS)
train_summary_hook = TensorboardSummaryHook(training.id, 'Training', PATH_ARTIFACTS)

train_loggin_hook_print = LoggingHook(training.id, 'Training', None)

# run optimization for NUM_EPOCHS
for i in range(NUM_EPOCHS):
    training.run()

And is there any rule to set DiceLoss's dim?

New full error message:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/eisen/utils/workflows/training.py in run(self)
    113         with self.epoch_aggregator as ea:
--> 114             for i, batch in enumerate(self.data_loader):
    115                 if self.gpu:

6 frames
/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    344     def __next__(self):
--> 345         data = self._next_data()
    346         self._num_yielded += 1

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    855                 del self._task_info[idx]
--> 856                 return self._process_data(data)
    857 

/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py in _process_data(self, data)
    880         if isinstance(data, ExceptionWrapper):
--> 881             data.reraise()
    882         return data

/usr/local/lib/python3.6/dist-packages/torch/_utils.py in reraise(self)
    394             msg = KeyErrorMessage(msg)
--> 395         raise self.exc_type(msg)

TypeError: Caught TypeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/usr/local/lib/python3.6/dist-packages/eisen/utils/__init__.py", line 120, in __getitem__
    item = self.transform(item)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/transforms/transforms.py", line 61, in __call__
    img = t(img)
  File "/usr/local/lib/python3.6/dist-packages/eisen/transforms/imaging.py", line 966, in __call__
    data[field] = (data[field] - self.mean) / self.std
TypeError: unsupported operand type(s) for -: 'Nifti1Image' and 'float'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-42-549c9012c245> in <module>()
      1 for i in range(NUM_EPOCHS):
----> 2     training.run()

/usr/local/lib/python3.6/dist-packages/eisen/utils/workflows/training.py in run(self)
    128                 )
    129 
--> 130                 ea(output_dictionary)
    131 
    132         dispatcher.send(

/usr/local/lib/python3.6/dist-packages/eisen/utils/workflows/workflows.py in __exit__(self, *args, **kwargs)
     95     def __exit__(self, *args, **kwargs):
     96         for typ in ['losses', 'metrics']:
---> 97             for i in range(len(self.epoch_data[typ])):
     98                 for key in self.epoch_data[typ][i].keys():
     99                     self.epoch_data[typ][i][key] = np.asarray(self.epoch_data[typ][i][key])

KeyError: 'losses'
eisen-ai commented 4 years ago

hello,

the error you are seeing with 'losses' is a error thrown AFTER the real error.

The real error is above

  File "/usr/local/lib/python3.6/dist-packages/eisen/transforms/imaging.py", line 966, in __call__
    data[field] = (data[field] - self.mean) / self.std
TypeError: unsupported operand type(s) for -: 'Nifti1Image' and 'float'

you need to convert nifty to bumpy. that's another transform

to_numpy_tform = NiftiToNumpy(['image', 'label'])

tform = Compose([
                 resample_tform,
                 to_numpy_tform, 
                 map_intensities,
                 crop,
                 add_channel, 
                 label_to_onehot
                 ])

please refer to example https://colab.research.google.com/drive/1BS2Frtk4nLqJGG-N-nGQGEwiA1cqGULD#scrollTo=fQxFD-0sFQCl

JoaoSantinha commented 4 years ago

Thank you very much for all the help, it now seems to be running! Sorry for all the (newbie) question