How to evaluate the test results?

li-pengcheng commented 5 years ago

Hi @FabianIsensee , I'm learned and inspired a lot from your work! And I trained on my own data and got some promising results using your code. But there still some questions bothers me: 1) After training and test phase, how could I get those quantitative metrics results? I try to use the evaluator.py and collect_results_files.py, but they do not work. 2) I don't know whether the Data with different channels can input into the network for training simultaneously, namely, if i have some data format like(328, 400, 400) and (290, 400, 400), can I train them in the same network? 3) By the way, it would be wonderful if their have a data augmentation Implementation with nifti or other 3D medical data format. Forgive my poor English, have a good day!

FabianIsensee commented 5 years ago

Hi there, I am happy to hear that nnU-net works well for your data!

After training and test phase, how could I get those quantitative metrics results? I try to use the evaluator.py and collect_results_files.py, but they do not work.

What is it you are trying to do exactly? Do you want to know what the results on your cross-validation are? Do you want to know what models to use for test set prediction?

I don't know whether the Data with different channels can input into the network for training simultaneously, namely, if i have some data format like(328, 400, 400) and (290, 400, 400), can I train them in the same network?

This sounds like you have 3D data. In this case 328 and 290 are NOT channels (as in color channels) but spatial dimensions. You can process heterogeneous data sizes. That is no problem at all ;-)

By the way, it would be wonderful if their have a data augmentation Implementation with nifti or other 3D medical data format.

Data loading and augmentation with nifti is a VERY bad idea. I use numpy arrays specifically because I can read them in a very particular way that is (as far as I know) not possible with niftis. Imagine your data is 512x512x512 voxels large (like an abdominal CT) and your patch size is 128x128x128. That means that your patch size covers 1/64 of a case. If you use nifti, you would need to read the entire case (if it is compressed you also need to decompress it) and then you throw away 63/64th of your work because you only need 1/64. That is a lot of wasted I/O and CPU time. Numpy arrays can be read as memmaps, meaning that if I need 128x128x128 from a specific location that is all I need to read, not the entire file. This problem becomes even more apparent when running 2D networks. Imagine having to read multiple >2GB files, one for each slice in the batch. Infeasible. The data augmentation as it is is highly optimized. nnU-Net takes niftis as input at the very start and produces niftis at the very end. I don't see a good reason why using niftis in the middle would be required :-)

Best, Fabian

li-pengcheng commented 5 years ago

Thanks for your early reply!

What is it you are trying to do exactly? Do you want to know what the results on your cross-validation are? Do you want to know what models to use for test set prediction? I trained the 3d_fullresmodel with only 4 fold cross-validation because I only had 4 pair of data, and they are in same modality.

I first put the train samples from /data/nnUNet/nnUNet_raw_splitted/Task01_MY_DATASET/imagesTr to /data/nnUNet/nnUNet_raw_splitted/Task01_MY_DATASET/imageTs. Then use the trained model in /data/nnUNet/nnUNet_trained_models/nnUNet/3d_fullres/Task01_MY_DATASET/nnUNetTrainer__nnUNetPlans/fold_4 to output segmentation results in /data/nnUNet/nnUNet_raw_splitted/Task01_MY_DATASET/results. Now I want to evaluate the metrics between the segmentation results and labels like dice, IOU, Hausdorff distance and etc. which you implement in metrics.py, what should I do?

FabianIsensee commented 5 years ago

Hi, so if I understand that correctly that you only have 4 training examples? And you train with 4 folds where in each fold you use 3 for training and 1 for validation? On that case you cannot use your data as test data because you already have trained on them. All you can do is to report a cross-validation score. FOr this, go into the /data/nnUNet/nnUNet_trained_models/nnUNet/3d_fullres/Task01_MY_DATASET/nnUNetTrainer__nnUNetPlans/fold_4/validation_raw folder and look at the summary.json file. There is one for each fold. If you still want to compute the metrics between two folders, use the evaluate_folder located in evaluator.py. It needs two folders as input. Both folders need to have files with the same names. Best, Fabian

li-pengcheng commented 5 years ago

Yes, you are right, I shouldn't use those data to test, I have another unseen data and label, and now I want to evaluate all metrics of this case, I followed your advice, and modified evaluator.py as follows:

if name == "main": import argparse parser = argparse.ArgumentParser() parser.add_argument('-test', default='/data/nnUNet/nnUNet_raw_splitted/Task01_MY_DATASET/imagesTs') parser.add_argument('-ref', default='/data/nnUNet/nnUNet_raw_splitted/Task01_MY_DATASET/labelsTs') parser.add_argument('-evaluator', default='Evaluator.evaluator') parser.add_argument('-metric_kwargs', default='default_metrics') args = parser.parse_args() run_evaluation(args=args)

and things get worse:

Traceback (most recent call last): File "/data/nnUNet/nnunet/evaluation/evaluator.py", line 455, in run_evaluation(args=args) File "/data/nnUNet/nnunet/evaluation/evaluator.py", line 309, in run_evaluation test, ref, evaluator, metric_kwargs = args TypeError: 'Namespace' object is not iterable

so could you please tell me how do modify this code in detail~ Thanks you very much.

FabianIsensee commented 5 years ago

That's because you should use evaluate_folder, not run_evaluation Best, Fabian

li-pengcheng commented 5 years ago

OMG! I just found out that my current code doesn't include evaluate_folder, I reloaded your latest code, But how should I defined label? : ) I'm so stupid, thank you for your patience！

FabianIsensee commented 5 years ago

No worries. Labels should be a list of classes (int) to be evaluated. If you have background and three classes it should be [1,2,3]

On Thu, Jul 11, 2019, 11:28 AM lpengc notifications@github.com wrote:

OMG! I just found out that my current code doesn't include evaluate_folder, I reloaded your latest code, But how should I defined label? : ) I'm so stupid, thank you for your patience！

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MIC-DKFZ/nnUNet/issues/31?email_source=notifications&email_token=ACWHKFFAQXAKBUN2YCSBNY3P634MNA5CNFSM4H7QPVWKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZWDPNA#issuecomment-510408628, or mute the thread https://github.com/notifications/unsubscribe-auth/ACWHKFCIEBQWJIFVEDLDKPDP634MNANCNFSM4H7QPVWA .

li-pengcheng commented 5 years ago

DONE! You are my lifesaver!

FabianIsensee commented 5 years ago

You're welcome =)

MIC-DKFZ / nnUNet

How to evaluate the test results? #31