issue with ConvertSegToBoundingBoxCoordinates when multiple ROI per slice or volume

paul-bd commented 5 years ago

Hi,

it seems that ConvertSegToBoundingBoxCoordinates makes a single bounding box when multiple ROIs are on the same slice (or volume). The same problem occurs whether self.dim = 2 or 3 in configs file.

Indeed for now I load via np.load 3D arrays using the data_loader of the LIDC dataset.

1 np.float32 for img of dim 128x128x256
1 np.int16 mask of dim 128x128x256 where there may be from 30 to 100 lesions by volume. All lesions are labeled with ones.

eg for 1 slice from the pred example in plots:

Is there anything that can be done to avoid this ?

Best regards

Paul

pfjaeger commented 5 years ago

Hi! how are the different rois labelled in your segmentation array? Just [0,1] or do they have individual labels per lesion? If they are only labelled as foreground (e.g. 1 vs. 0), you need to use the "get_rois_from_seg_flag". (unlike in the lidc data loader, where this flag is set to False, becuase individual lesions already had individual labels)

paul-bd commented 5 years ago

Thanks for your reactivity!

Indeed my labels are only [0,1].

Modifying all get_rois_from_seg_flag to True in the data_loader get me this error (works normally if set to False) :

Traceback (most recent call last): File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py", line 35, in producer item = transform(**item) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/abstract_transforms.py", line 84, in __call__ data_dict = t(**data_dict) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 230, in __call__ data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/augmentations/utils.py", line 466, in convert_seg_to_bounding_box_coordinates data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands ValueError: cannot copy sequence with size 8 to array axis with dimension 1 Process Process-4: Traceback (most recent call last): File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py", line 35, in producer item = transform(**item) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/abstract_transforms.py", line 84, in __call__ data_dict = t(**data_dict) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 230, in __call__ data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/augmentations/utils.py", line 466, in convert_seg_to_bounding_box_coordinates data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands ValueError: cannot copy sequence with size 2 to array axis with dimension 1 Process Process-5: Traceback (most recent call last): File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py", line 35, in producer item = transform(**item) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/abstract_transforms.py", line 84, in __call__ data_dict = t(**data_dict) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 230, in __call__ data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/augmentations/utils.py", line 466, in convert_seg_to_bounding_box_coordinates data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands ValueError: cannot copy sequence with size 3 to array axis with dimension 1 Process Process-2: Traceback (most recent call last): File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py", line 35, in producer item = transform(**item) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/abstract_transforms.py", line 84, in __call__ data_dict = t(**data_dict) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 230, in __call__ data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/augmentations/utils.py", line 466, in convert_seg_to_bounding_box_coordinates data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands ValueError: cannot copy sequence with size 31 to array axis with dimension 1 Process Process-3: Traceback (most recent call last): File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py", line 35, in producer item = transform(**item) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/abstract_transforms.py", line 84, in __call__ data_dict = t(**data_dict) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 230, in __call__ data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/augmentations/utils.py", line 466, in convert_seg_to_bounding_box_coordinates data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands ValueError: cannot copy sequence with size 27 to array axis with dimension 1 Process Process-6: Traceback (most recent call last): File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/home/paulbd/anaconda3/envs/MDK/lib/python3.6/multiprocessing/process.py", line 93, in run self._target(*self._args, **self._kwargs) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/dataloading/multi_threaded_augmenter.py", line 35, in producer item = transform(**item) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/abstract_transforms.py", line 84, in __call__ data_dict = t(**data_dict) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 230, in __call__ data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag) File "/home/paulbd/medicaldetectiontoolkit/batchgenerators/batchgenerators/augmentations/utils.py", line 466, in convert_seg_to_bounding_box_coordinates data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands ValueError: cannot copy sequence with size 8 to array axis with dimension 1

Any idea why ?

Best,

Paul

pfjaeger commented 5 years ago

I just figured the docu for this function is missing (will add it very soon). The fact that your lesions are not individually labeled in your segmentation map implies that they also do not have individual class labels. So the function now expects a scalar for data_dict[‘class_target’][b]. This could be a per-patient label [0, ..., n] or if you only have one class in the data set, just put 0. (This fix should be done in your data loader)

paul-bd commented 5 years ago

Ok thanks! yes indeed, but assignation fails using the provided code, I don't know why, maybe because of how I saved in the dtype of class_target... I nevertheless used a trick that work but is kind of dirty

    data_dict_b=dict(data_dict)
    del data_dict['class_target']
    data_dict['class_target']=[]
    out_seg = np.copy(data_dict['seg'])

    for b in range(data_dict['seg'].shape[0]):

        p_coords_list = []
        p_roi_masks_list = []
        p_roi_labels_list = []
        print('NEW BATCH')
        if np.sum(data_dict['seg'][b]!=0) > 0:
            if get_rois_from_seg_flag:
                clusters, n_cands = lb(data_dict['seg'][b])
                val_initiale=data_dict_b['class_target'][b]
                val_initiale=list(val_initiale)[0]
                data_dict['class_target'].append([val_initiale]* n_cands)
                print(data_dict['class_target'][b])

`

it then returns me [0,0,0.....] as expected but it stop after a few batches (before starting 1st epoch)... It would maybe be more convenient preprocess it as in the LIDC and doing this part before? Could you please attach the info_df.pickle, as I will try to make exactly the same hoping that it work (notably the class_target column :) )!

I am also interested in the second part you mentioned (as I indeed only have one class_target, but for convenience and as it failed when I put only one class my class target is a random integer between 1 and 5... Where should that be modified in the data_loader?

Thank you so much for your time,

Paul

paul-bd commented 5 years ago

Seems to be working when preprocessing it :) and using a info_df.pickle like this :

pred_example_0

Thanks Will work on the part with one class, I think it is because of the balanced sampling of patches between classes that their maybe a problem,

best regards

pfjaeger commented 5 years ago

You are still assigning several class labels per patient? why not make elements in the class_target column in the info_df a scalar instead of a list? Or alternatively changing line 235 in the lidc data loader to : batch_targets.append(0) I would not recommend to change the function in the batch generators.

Also if you only have one class you should not use the "get_class_balanced_patients" function in line 225, but just draw random samples from all patients in your training data for batch generation.

paul-bd commented 5 years ago

Indeed, thanks, working great when putting it as a scalar and taking random samples. (training in progess :) ). Best regards, and happy new year!

sophie-isobel commented 5 years ago

Hi @paul-bd and @pfjaeger , I also only have data labelled as foreground (e.g. 1 vs. 0) and I am getting the same error from convert_seg_to_bounding_box_coordinates

...
  File "/content/gdrive/My Drive/Dissertation/medicaldetectiontoolkit-master (1)/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 229, in __call__
    data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag)
  File "/content/gdrive/My Drive/Dissertation/medicaldetectiontoolkit-master (1)/batchgenerators/batchgenerators/augmentations/utils.py", line 518, in convert_seg_to_bounding_box_coordinates
    data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands
ValueError: cannot copy sequence with size 12 to array axis with dimension 1
...

I followed your advice and set all get_rois_from_seg_flag=True in data_loader.py, I have also set get_rois_from_seg_flag=True in the batchgenerator files: transforms -> utility_transforms.py and augmentations -> utils.py.

And I have set it to just draw random samples from all patients in training data for batch generation (line 225):

...
def generate_train_batch(self):

        batch_data, batch_segs, batch_pids, batch_targets, batch_patient_labels = [], [], [], [], []
        class_targets_list =  [v['class_target'] for (k, v) in self._data.items()]

        #if self.cf.head_classes > 2:
            # samples patients towards equilibrium of foreground classes on a roi-level (after randomly sampling the ratio "batch_sample_slack).
        #    batch_ixs = dutils.get_class_balanced_patients(
        #        class_targets_list, self.batch_size, self.cf.head_classes - 1, slack_factor=self.cf.batch_sample_slack)
        #else:
        batch_ixs = np.random.choice(len(class_targets_list), self.batch_size)

        patients = list(self._data.items())
...

When I change line 235 to batch_targets.append(0) I still get ValueError: setting an array element with a sequence:

...
  File "/content/gdrive/My Drive/Dissertation/medicaldetectiontoolkit-master/batchgenerators/batchgenerators/transforms/utility_transforms.py", line 229, in __call__
    data_dict = convert_seg_to_bounding_box_coordinates(data_dict, self.dim, self.get_rois_from_seg_flag, class_specific_seg_flag=self.class_specific_seg_flag)
  File "/content/gdrive/My Drive/Dissertation/medicaldetectiontoolkit-master/batchgenerators/batchgenerators/augmentations/utils.py", line 518, in convert_seg_to_bounding_box_coordinates
    data_dict['class_target'][b] = [data_dict['class_target'][b]] * n_cands
ValueError: setting an array element with a sequence.
...

note, my class targets are set as 0 for each individual pid in data:

...
    data = OrderedDict()
    for ix, pid in enumerate(pids):
        targets = [0]
        data[pid] = {'data': imgs[ix], 'seg': segs[ix], 'pid': pid, 'class_target': targets}

    return data
...

Would you know why I am still getting this error? thanks in advance.

hlc1209 commented 4 years ago

Or alternatively changing line 235 in the lidc data loader to : batch_targets.append(0)

And should also change line 313 to: class_target = batch_targets

cristinaperez9 commented 2 years ago

Hello!

I am also working with only one class dataset. I have been able to solve this issue for the training set, but not when making inference.

I would be extremely grateful if someone could share their dataset loader for inference.

MIC-DKFZ / medicaldetectiontoolkit

issue with ConvertSegToBoundingBoxCoordinates when multiple ROI per slice or volume #11