Closed ivanwilliammd closed 5 years ago
Update: I already able to disable the RoI suppressor by modifying extra zero numpy array made. However I still struggle with problem number 2
hi the preprocessing script is only an example script for the LIDC data set! For your custom data set you would need to write your own custom preprocessing script. The example script can help you to get an idea how to structure your script but the exact steps such as loading of your data need to be customized according to your specific needs.
Features like aggregation over multiple raters are very specific to LIDC and you do not need to do that so you can probably skip half of the steps in the example script.
hi the preprocessing script is only an example script for the LIDC data set! For your custom data set you would need to write your own custom preprocessing script. The example script can help you to get an idea how to structure your script but the exact steps such as loading of your data need to be customized according to your specific needs.
Features like aggregation over multiple raters are very specific to LIDC and you do not need to do that so you can probably skip half of the steps in the example script.
Thank you Sir, I already create my custom dataset, however sometimes when running preprocessing.py, there are some error Index 0 is out of bound for axis 0 with size 0
which resulted in me deleting the data. Is the problem lie on the csv or the image file Sir?
hi the preprocessing script is only an example script for the LIDC data set! For your custom data set you would need to write your own custom preprocessing script. The example script can help you to get an idea how to structure your script but the exact steps such as loading of your data need to be customized according to your specific needs.
Features like aggregation over multiple raters are very specific to LIDC and you do not need to do that so you can probably skip half of the steps in the example script.
By the way, can I ask for reference how much epoch, batch_size, and num_train_batches suitable for training around 300-500 thorax CT scan dataset (approximately each file has similar file size with LIDC dataset will be run at Tesla P100 16GB)?
I would recommend you to maximize batch size filling up GPU memory. Pick an intuitive number for epochs and training batches and observe in the monitoring plot if your model is overfitting and then adjust the training length accordingly.
I would recommend you to maximize batch size filling up GPU memory. Pick an intuitive number for epochs and training batches and observe in the monitoring plot if your model is overfitting and then adjust the training length accordingly.
Thank you Sir for the tips. By the way, I already done some quick train (only folds_0) with parameter as following: https://github.com/pfjaeger/medicaldetectiontoolkit/issues/40
with configs.py
:
self.report_score_level = ['patient', 'rois'] # choose list from 'patient', 'rois'
self.class_dict = {1: 'groundglass', 2: 'subsolid', 3: 'solid'} # 0 is background.
self.patient_class_of_interest = 2 # patient metrics are only plotted for one class.
self.ap_match_ious = [0.1] # list of ious to be evaluated for ap-scoring.
self.model_selection_criteria = ['solid_ap', 'subsolid_ap', 'groundglass_ap'] # criteria to average over for saving epochs.
self.min_det_thresh = 0.1 # minimum confidence value to select predictions for evaluation.
self.head_classes = 4
# seg_classes hier refers to the first stage classifier (RPN)
self.num_seg_classes = 2 # foreground vs. background
and the results I get :
****************************
results for fold 0
****************************
fold df shape (12551, 7)
AUC 0.5000 AP 0.5061 fold_0 patient cl_1
AUC 0.0000 AP 0.0000 fold_0 rois cl_1
AUC 0.5000 AP 0.3882 fold_0 patient cl_2
AUC 0.0000 AP 0.0000 fold_0 rois cl_2
AUC 0.1154 AP 0.7575 fold_0 patient cl_3
AUC 0.0000 AP 0.0000 fold_0 rois cl_3
AUC 0.0000 AP 0.0000 average_foreground_roi
If I'm not mistaken AUC 0.5 means it can't differentiate the class clearly, and rois cl_1, 2, 3 equal to zero means that all the prediction boxes missed? Visualization show that the anchors already OK, classification too, however the bounding boxes are off.
Are the modified configs.py
parameters applicable for 3 class texture classification (solid, subsolid, groundglass)?
Thank you Sir
hi can you please post this in the slack channel? i guess this is getting a little off topic for the initial issue. thanks!
hi can you please post this in the slack channel? i guess this is getting a little off topic for the initial issue. thanks!
Okay Sir.
Hello Sir Paul, thank you for your answer last time, regarding ROI suppresor on LIDC, I have some question If my dataset only contain one final annotation per patients (one final annotation by head radiologist, instead of multiple radiologist annotation), will the ROI suppresed or not?
I hope my information help you identify the problem:
LIDC-IDRI-Processing
almost 80% of data filtered out because incompatible DICOM when loaded withMitkCLDicomtoNRRD
. --> I already ask at https://github.com/MIC-DKFZ/LIDC-IDRI-processing/issues/5, but no respond yet, maybe you know where the problem is?lidc_exp/preprocessing.py
and almost all (90%) of RoIs are suppresed eventhough the RoI is already confimed to be a single RoI (not multiple RoI).Could you give me some hint, how to unsuppress RoI at
preprocessing.py
line 75-102? Thank you Sir @pfjaeger.