MIC-DKFZ / medicaldetectiontoolkit

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.
Apache License 2.0
1.31k stars 297 forks source link

Dataloader in lidc_exp #12

Closed DevinCheung closed 5 years ago

DevinCheung commented 5 years ago

What is the format of the input in lidc_exp? I followed preprocessing.py, and set my dataset folder as :

path/to/dataset/pp_norm/xxxxxx_img.npy path/to/dataset/pp_norm/xxxxxx_rois.npy ... ... path/to/dataset/pp_norm/info_df.pickle

However, when I run the training, the program got struck at exec.py: batch = next(batch_gen['train']). There should be some bugs when I prepared my input. Hope to get some help, thanks!

octaviomtz commented 5 years ago

Hi @DevinCheung , I have the same problem that you mention. Did you find out what was the issue or how to solve it?

DevinCheung commented 5 years ago

@octaviomtz I have managed to run the code. Can you make a specific description about you problem?

wgs123 commented 5 years ago

Can you run the preprocessing.py ? What files I need?

pfjaeger commented 5 years ago

see here: https://github.com/pfjaeger/medicaldetectiontoolkit/issues/2

wgs123 commented 5 years ago

I copy the code and make the characteristics.csv, In the preprocessing.py df = pd.read_csv(os.path.join('C\', 'characteristics.csv'), sep=';') df = df[df.PatientID == pid] the PatientID is not in characteristics.csv

pfjaeger commented 5 years ago

what does your characteristics.csv look like?

wgs123 commented 5 years ago

The first row like 0078a;0;anonymous;00000000;-1;-1;-1;-1;-1;-1;-1;-1;-1

pfjaeger commented 5 years ago

0078a is the PatientID. Is there no header in your file?

wgs123 commented 5 years ago

Yes,and I run #2 code , but there is no header in my file

wgs123 commented 5 years ago

And I would like to know the header

pfjaeger commented 5 years ago

Ok thanks for reporting, I will check back with my colleague , who wrote the data conversion. As a quick fix for you, this is the header of characteristics.csv:

PatientID;SessionID;Radiologist;NoduleID;Subtlety;InternalStructure;Calcification;Sphericity;Margin;Lobulation;Spiculation;Texture;Malignancy

pfjaeger commented 5 years ago

The data conversion tools have been updated and the header is now produced by default, apologies for the confusion

ivanwilliammd commented 5 years ago

Ok thanks for reporting, I will check back with my colleague , who wrote the data conversion. As a quick fix for you, this is the header of characteristics.csv:

PatientID;SessionID;Radiologist;NoduleID;Subtlety;InternalStructure;Calcification;Sphericity;Margin;Lobulation;Spiculation;Texture;Malignancy

Hi Sir Paul @pfjaeger, for the updated data conversion tools code from https://github.com/MIC-DKFZ/LIDC-IDRI-processing/tree/v1.0.1 actually generated different multiple-repeated header for non-fully downloaded LIDC dataset (in my case, I only download 10 LIDC dataset) The header generated are :

Patient_ID;Session_ID;Radiologist;Nodule_Str;subtlety;internalStructure;calcification;sphericity;margin;lobulation;spiculation;texture;malignancy

renxx08 commented 5 years ago

@DevinCheung I met the same problem. When I excuted the 'python exec.py --mode train --exp_source experiments/my_experiment --exp_dir path/to/experiment/directory' . I aslo got struck at exec.py: batch = next(batch_gen['train']). The terminal show that 'starting training epoch 1'. Could you please tell me how to solve this problem?

pfjaeger commented 5 years ago

I assume you are stuck here: dataloader_utils/get_class_balanced_patients Are you sure there is at least one training instance present of each class in your training data? I will add a warning in the next commit.

pfjaeger commented 5 years ago

or do you have less than 2 foreground classes? if yes, then do not use this function.