IndexError when using the 60 sec example on a new dataset

Itzikwa commented 3 years ago

Hi,

I converted my dicom dataset to nifti files (images and segmentations), and I arranged the data in a folder like the kits19 example. But, when I try to use the "60 sec" example on my own dataset, I got the next error:

IndexError: index -1000 is out of bounds for axis 1 with size 3

If it may help, I can share few samples of my dataset.

This is the rest of the error message:

IndexError Traceback (most recent call last)

in 1 sample_list = data_io.get_indiceslist() ----> 2 model.train(sample_list, epochs=10) 3 4 # Predict the segmentation for 20 samples 5 pred = model.predict(sample_list, return_output=True) c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.3-py3.8.egg\miscnn\neural_network\model.py in train(self, sample_list, epochs, iterations, callbacks) 117 iterations=iterations) 118 # Run training process with Keras fit --> 119 self.model.fit(dataGen, 120 epochs=epochs, 121 callbacks=callbacks, c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self, *args, **kwargs) 106 def _method_wrapper(self, *args, **kwargs): 107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access --> 108 return method(self, *args, **kwargs) 109 110 # Running inside `run_distribute_coordinator` already. c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing) 1047 training_utils.RespectCompiledTrainableState(self): 1048 # Creates a `tf.data.Dataset` and handles batch and epoch iteration. -> 1049 data_handler = data_adapter.DataHandler( 1050 x=x, 1051 y=y, c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weight, batch_size, steps_per_epoch, initial_epoch, epochs, shuffle, class_weight, max_queue_size, workers, use_multiprocessing, model, steps_per_execution) 1103 1104 adapter_cls = select_data_adapter(x, y) -> 1105 self._adapter = adapter_cls( 1106 x, 1107 y, c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, shuffle, workers, use_multiprocessing, max_queue_size, model, **kwargs) 907 self._keras_sequence = x 908 self._enqueuer = None --> 909 super(KerasSequenceAdapter, self).__init__( 910 x, 911 shuffle=False, # Shuffle is handed in the _make_callable override. c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py in __init__(self, x, y, sample_weights, workers, use_multiprocessing, max_queue_size, model, **kwargs) 784 # Since we have to know the dtype of the python generator when we build the 785 # dataset, we have to look at a batch to infer the structure. --> 786 peek, x = self._peek_and_restore(x) 787 peek = self._standardize_batch(peek) 788 peek = _process_tensorlike(peek) c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\engine\data_adapter.py in _peek_and_restore(x) 918 @staticmethod 919 def _peek_and_restore(x): --> 920 return x[0], x 921 922 def _handle_multiprocessing(self, x, workers, use_multiprocessing, c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.3-py3.8.egg\miscnn\neural_network\data_generator.py in __getitem__(self, idx) 61 # Load a batch by generating it or by loading an already prepared 62 if self.preprocessor.prepare_batches : batch = self.load_batch(idx) ---> 63 else : batch = self.generate_batch(idx) 64 # Return the batch containing only an image or an image and segmentation 65 if self.training: c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.3-py3.8.egg\miscnn\neural_network\data_generator.py in generate_batch(self, idx) 144 self.sample_list.extend(samples) 145 # create a new batch --> 146 batches = self.preprocessor.run(samples, self.training, 147 self.validation) 148 # Create threading lock to avoid parallel access c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.3-py3.8.egg\miscnn\processing\preprocessor.py in run(self, indices_list, training, validation) 134 # Transform digit segmentation classes into categorical 135 if training: --> 136 sample.seg_data = to_categorical(sample.seg_data, 137 num_classes=sample.classes) 138 # Decide if data augmentation should be performed c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\tensorflow\python\keras\utils\np_utils.py in to_categorical(y, num_classes, dtype) 76 n = y.shape[0] 77 categorical = np.zeros((n, num_classes), dtype=dtype) ---> 78 categorical[np.arange(n), y] = 1 79 output_shape = input_shape + (num_classes,) 80 categorical = np.reshape(categorical, output_shape) IndexError: index -1000 is out of bounds for axis 1 with size 3

muellerdo commented 3 years ago

Hey @Itzikwa,

sounds like some parameters are not matching with the input data regarding image shape or number of classes.

Could you run the following code and share the output with us:

# Initialize your Data IO class on your data set
data_io = ...

# Obtain sample list
sample_list = data_io.get_indiceslist()

# Print out sample shapes
for index in sample_list:
   sample = data_io.sample_loader(index, load_seg=True)
   print(sample.index, sample.img_data.shape, sample.seg_data.shape)

# Print out Data IO configurations
print(data_io.interface.classes, data_io.interface.channels, data_io.interface.three_dim)

I'm also a little bit confused where the -1000 index comes from :x

Cheers, Dominik

EDITED: Fixed a little bug in the code section. The variable sample_list contains the indices and not the sample objects
-> added a line to load the samples given the index.

Itzikwa commented 3 years ago

Thank you very much for your answer... this is what I got:

case_00000 (400, 400, 300, 1) (400, 400, 300, 1)
case_00001 (200, 200, 300, 1) (200, 200, 300, 1)
case_00002 (501, 561, 189, 1) (501, 561, 189, 1)
case_00003 (481, 481, 481, 1) (481, 481, 481, 1)
case_00004 (503, 503, 501, 1) (503, 503, 282, 1)
case_00005 (481, 481, 481, 1) (481, 481, 481, 1)
case_00006 (503, 503, 501, 1) (503, 503, 501, 1)
case_00007 (800, 800, 576, 1) (800, 800, 576, 1)
case_00008 (400, 400, 300, 1) (400, 400, 300, 1)
3 1 True

I just thinking about this now... some of the niftis arrays contains value of -1000, it might explain the problem?

muellerdo commented 3 years ago

I just thinking about this now... some of the niftis arrays contains value of -1000, it might explain the problem?

The images or the segmentations?

I suspect that the provided segmentation masks could be not correctly formated. MIScnn uses the tf.keras.utils.to_categorical function from Tensorflow/Keras. If you have 3 classes in your segmentation, it expects that the segmentation values are starting from 0 to number_of_classes-1 and should look like this: [0] or [1] or [2]. Maybe, it could be that your segmentation look something like this: [-1000] or [?] or [?]

Could you please print out your segmentation mask by running the following code:

import numpy as np

# Print out all unique segmentation classes per sample
for index in sample_list:
   sample = data_io.sample_loader(index, load_seg=True)
   print(sample.index, np.unique(sample.seg_data))

# Print out the complete segmentation mask of the last loaded sample
print(sample.seg_data)

Itzikwa commented 3 years ago

Maybe, it could be that your segmentation look something like this: [-1000] or [?] or [?]

You totally right...

case_00000 [-1000.  -999.]
case_00001 [-1000.  -999.]
case_00002 [0. 1.]
case_00003 [0. 1.]
case_00004 [0. 1.]
case_00005 [0. 1.]
case_00006 [0. 1.]
case_00007 [0. 1.]
case_00008 [-1000.  -999.]
[[[[-1000.]
   [-1000.]
   [-1000.]
   ...
   [-1000.]
   [-1000.]
   [-1000.]]
....

muellerdo commented 3 years ago

Hey @Itzikwa,

sorry for the late reply.

Did you already transform your segmentation classes into integers starting from 0? (0, 1, 2, ... or just 0, 1) Is it working, now?

If not, I would recommend checking out the unique values for each samples. If you have just [0, 1] and [-1000, -999], you can just run something like this for each sample:

mask[mask == -1000] = 0
mask[mask == -999] = 1

Be aware that you have to do this before either before loading the data into MIScnn or via a custom Subfunction which you can integrate into the MIScnn pipeline.

If you want to do this with a Subfunction, it could look something like this:

# Internal libraries/scripts
from miscnn.processing.subfunctions.abstract_subfunction import Abstract_Subfunction

class MySubfunction(Abstract_Subfunction):
    #---------------------------------------------#
    #                Initialization               #
    #---------------------------------------------#
    def __init__(self):
        pass

    #---------------------------------------------#
    #                Preprocessing                #
    #---------------------------------------------#
    def preprocessing(self, sample, training=True):
        if training:
            mask = sample.seg_data
            mask[mask == -1000] = 0
            mask[mask == -999] = 1
            sample.seg_data = mask

    #---------------------------------------------#
    #               Postprocessing                #
    #---------------------------------------------#
    def postprocessing(self, prediction):
        return prediction

Cheers, Dominik

Note: Sorry for the issue closing... Misclick

Itzikwa commented 3 years ago

It worked perfectly! Thank you for simple and elegant solution.

But... when I tried to predict the samples I got this error:

ValueError                                Traceback (most recent call last)
<ipython-input-12-17833f153698> in <module>
      3 
      4 # Predict the segmentation for 20 samples
----> 5 pred = model.predict(sample_list, return_output=True)

c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.4-py3.8.egg\miscnn\neural_network\model.py in predict(self, sample_list, return_output, activation_output)
    147         for sample in sample_list:
    148             # Initialize Keras Data Generator for generating batches
--> 149             dataGen = DataGenerator([sample], self.preprocessor,
    150                                     training=False, validation=False,
    151                                     shuffle=False, iterations=None)

c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.4-py3.8.egg\miscnn\neural_network\data_generator.py in __init__(self, sample_list, preprocessor, training, validation, shuffle, iterations)
     55             self.batchpointers = list(range(0, batches_count+1))
     56         elif not training:
---> 57             self.batch_queue = preprocessor.run(sample_list, False, False)
     58 
     59     # Return the next batch for associated index

c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.4-py3.8.egg\miscnn\processing\preprocessor.py in run(self, indices_list, training, validation)
    152                 if not training:
    153                     self.cache["shape_" + str(index)] = sample.img_data.shape
--> 154                 ready_data = self.analysis_patchwise_grid(sample, training,
    155                                                           data_aug)
    156             # Identify if current index is the last one

c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\miscnn-1.0.4-py3.8.egg\miscnn\processing\preprocessor.py in analysis_patchwise_grid(self, sample, training, data_aug)
    266                     del patches_seg[i]
    267         # Concatenate a list of patches into a single numpy array
--> 268         img_data = np.stack(patches_img, axis=0)
    269         if training : seg_data = np.stack(patches_seg, axis=0)
    270         # Pad patches if necessary

<__array_function__ internals> in stack(*args, **kwargs)

c:\users\group2\appdata\local\programs\python\python38\lib\site-packages\numpy\core\shape_base.py in stack(arrays, axis, out)
    420     arrays = [asanyarray(arr) for arr in arrays]
    421     if not arrays:
--> 422         raise ValueError('need at least one array to stack')
    423 
    424     shapes = {arr.shape for arr in arrays}

ValueError: need at least one array to stack

It seems that patches_img is empty for some reason, but I cannot understand really why...

muellerdo commented 3 years ago

Hey @Itzikwa,

mhm. Would it be possible that you share your code including your MIScnn class initializations and parameters?

Looks like the slice_matrix() function returns an empty list. (the function slices a matrix/image into patches) I suspect that it has something to-do with the data loading, in detail that maybe the three_dim boolean parameter of your data IO interface is not correct or is not set on True? Did you specify the patchwise_overlap class variable for the Preprocessor class? It could also be that some image is drastically smaller than the others and we are observing some kind of new bug here.

For debugging, I would recommend start printing out the output of the slice_matrix() function after line 254 from the preprocessor.py file.

Cheers, Dominik

Itzikwa commented 3 years ago

I realized what exactly the problem is. the problem is in the calculate steps in slice 3DMatrix. I printed the steps and it turn out that they are all zeroes, so the code doesn't enter to the for loop. And that happened, as you say, because, the array's length was too small, and the steps calculated as negative values (and after ceil as zero). That means, that all I should do is to minimize the overlap values.

Thanks

frankkramer-lab / MIScnn

IndexError when using the 60 sec example on a new dataset #42