ROI suppresor on LIDC preprocessing.py

ivanwilliammd commented 5 years ago

Hello Sir Paul, thank you for your answer last time, regarding ROI suppresor on LIDC, I have some question If my dataset only contain one final annotation per patients (one final annotation by head radiologist, instead of multiple radiologist annotation), will the ROI suppresed or not?

I hope my information help you identify the problem:

My private dataset of Body (Thorax) and Lung CT scan with slice thickness 0.5 and 1.0
When preprocessed using modified LIDC-IDRI-Processing almost 80% of data filtered out because incompatible DICOM when loaded with MitkCLDicomtoNRRD. --> I already ask at https://github.com/MIC-DKFZ/LIDC-IDRI-processing/issues/5, but no respond yet, maybe you know where the problem is?
After passing it through, I run lidc_exp/preprocessing.py and almost all (90%) of RoIs are suppresed eventhough the RoI is already confimed to be a single RoI (not multiple RoI).

Could you give me some hint, how to unsuppress RoI at preprocessing.py line 75-102? Thank you Sir @pfjaeger.

ivanwilliammd commented 5 years ago

Update: I already able to disable the RoI suppressor by modifying extra zero numpy array made. However I still struggle with problem number 2

pfjaeger commented 5 years ago

hi the preprocessing script is only an example script for the LIDC data set! For your custom data set you would need to write your own custom preprocessing script. The example script can help you to get an idea how to structure your script but the exact steps such as loading of your data need to be customized according to your specific needs.

Features like aggregation over multiple raters are very specific to LIDC and you do not need to do that so you can probably skip half of the steps in the example script.

ivanwilliammd commented 5 years ago

hi the preprocessing script is only an example script for the LIDC data set! For your custom data set you would need to write your own custom preprocessing script. The example script can help you to get an idea how to structure your script but the exact steps such as loading of your data need to be customized according to your specific needs.

Features like aggregation over multiple raters are very specific to LIDC and you do not need to do that so you can probably skip half of the steps in the example script.

Thank you Sir, I already create my custom dataset, however sometimes when running preprocessing.py, there are some error Index 0 is out of bound for axis 0 with size 0 which resulted in me deleting the data. Is the problem lie on the csv or the image file Sir?

ivanwilliammd commented 5 years ago

hi the preprocessing script is only an example script for the LIDC data set! For your custom data set you would need to write your own custom preprocessing script. The example script can help you to get an idea how to structure your script but the exact steps such as loading of your data need to be customized according to your specific needs.

Features like aggregation over multiple raters are very specific to LIDC and you do not need to do that so you can probably skip half of the steps in the example script.

By the way, can I ask for reference how much epoch, batch_size, and num_train_batches suitable for training around 300-500 thorax CT scan dataset (approximately each file has similar file size with LIDC dataset will be run at Tesla P100 16GB)?

pfjaeger commented 5 years ago

I would recommend you to maximize batch size filling up GPU memory. Pick an intuitive number for epochs and training batches and observe in the monitoring plot if your model is overfitting and then adjust the training length accordingly.

ivanwilliammd commented 5 years ago

I would recommend you to maximize batch size filling up GPU memory. Pick an intuitive number for epochs and training batches and observe in the monitoring plot if your model is overfitting and then adjust the training length accordingly.

Thank you Sir for the tips. By the way, I already done some quick train (only folds_0) with parameter as following: https://github.com/pfjaeger/medicaldetectiontoolkit/issues/40

with configs.py:

        self.report_score_level = ['patient', 'rois']  # choose list from 'patient', 'rois'
        self.class_dict = {1: 'groundglass', 2: 'subsolid', 3: 'solid'}  # 0 is background.
        self.patient_class_of_interest = 2  # patient metrics are only plotted for one class.
        self.ap_match_ious = [0.1]  # list of ious to be evaluated for ap-scoring.

        self.model_selection_criteria = ['solid_ap', 'subsolid_ap', 'groundglass_ap'] # criteria to average over for saving epochs.
        self.min_det_thresh = 0.1  # minimum confidence value to select predictions for evaluation.
        self.head_classes = 4

        # seg_classes hier refers to the first stage classifier (RPN)
        self.num_seg_classes = 2  # foreground vs. background

and the results I get :


****************************
results for fold 0 
****************************
fold df shape (12551, 7)

AUC 0.5000  AP 0.5061 fold_0 patient cl_1 
AUC 0.0000  AP 0.0000 fold_0 rois cl_1 
AUC 0.5000  AP 0.3882 fold_0 patient cl_2 
AUC 0.0000  AP 0.0000 fold_0 rois cl_2 
AUC 0.1154  AP 0.7575 fold_0 patient cl_3 
AUC 0.0000  AP 0.0000 fold_0 rois cl_3 
AUC 0.0000  AP 0.0000 average_foreground_roi

If I'm not mistaken AUC 0.5 means it can't differentiate the class clearly, and rois cl_1, 2, 3 equal to zero means that all the prediction boxes missed? Visualization show that the anchors already OK, classification too, however the bounding boxes are off.

Are the modified configs.pyparameters applicable for 3 class texture classification (solid, subsolid, groundglass)? Thank you Sir

pfjaeger commented 5 years ago

hi can you please post this in the slack channel? i guess this is getting a little off topic for the initial issue. thanks!

ivanwilliammd commented 5 years ago

hi can you please post this in the slack channel? i guess this is getting a little off topic for the initial issue. thanks!

Okay Sir.

MIC-DKFZ / medicaldetectiontoolkit

ROI suppresor on LIDC preprocessing.py #39