Poor results on my testing data

damiankucharski commented 1 year ago

Hello @YixingHuang, I think I finally managed to run your model and obtain results on my data. However the results are quite poor and I wonder whether that is a matter of data drift or maybe I am missing something.

I had a problem running your preprocessing pipeline with my Slicer installation. However, I read through your code and tried to replicate that programmatically.

I have a brain, T1 contrast applied, MRI prior and target images.

I used 3D slicer 4.10.2 to perform N4ITK MRI Bias correction as this built-in function works for me. However I think that under the hood it uses this function from SimpleITK, https://simpleitk.readthedocs.io/en/master/link_N4BiasFieldCorrection_docs.html, so it is probably possible for me to just use it.
My prior and target images are already registered, I used this software to do that https://greedy.readthedocs.io/en/latest/
I have resampled the images to 240 240 155 with an isotropic voxel size of 1 mm as you mentioned in https://github.com/YixingHuang/DeepMedicPlus/issues/4
I extracted the brain from the scans using HD-BET, so that the n4 bias-corrected, resampled images, are only brain images without skulls.
I z-score normalized the images as it is what https://simpleitk.org/doxygen/latest/html/classitk_1_1simple_1_1NormalizeImageFilter.html function seems to be doing.

After performing these steps I used your model to find lesions. It seems to find some lesion on my image:

But at the same time it also marks a lot of singular pixels and areas that for sure are not lesions:

Did I miss something in the preprocessing pipeline? Or you think that the poor results may be the result of data drift?

Also, do you have any single example from your dataset that you could share so that I can test whether I can run your model on it and get good results?

Regards, Damian Kucharski

YixingHuang commented 1 year ago

Hi @damiankucharski are you using the high sensitivity model? If so, it is likely to have several false positive cases. Try the high-precision model as well.

Our preprocessed MRI volumes have an intensity value of around -0.4125 for the empty background. Most of the brain voxels are between 1.3 to 3.0, while other high contrast structures can be 4.0 to 7.0. Can you tell me yours so I can quickly check how close your data is to ours.

Due to data privacy regulations, we are not allowed to share the data. To improve the performace, some finetuning on your own training dataset is necessary.

damiankucharski commented 1 year ago

Hello @YixingHuang, please see attached screenshot where I marked some point in the image and associated values.

YixingHuang commented 1 year ago

The intensity range and the appearance are similar to ours. On our test data, on average there will be around 2 false positive (FP)detections per patient. Can you check the current performance on your dataset? Please tell me the sensitivity, precision, #FP per patient, so I can tell that whether the performance on your data is far worse than ours.

damiankucharski commented 1 year ago

Okay, I will try to run all my data through your models, it may take longer time though, I will get back to you with the results. For this singular case I tested the high precision model and indeed it works better. Apart from that, I am struggling to find a way to specify output directory of the model. Is it possible to set it up or the results must be saved to DeepMedicPlus/DeepMedicPlus/examples/output/predictions/testSessionDm/predictions?

YixingHuang commented 1 year ago

In the testConfig.cfg file, you can change the sessionName from "testSessionDm" to other names. Then the saved folder will change accordingly.

YixingHuang commented 1 year ago

And glad to hear that the high precision model works better.

YixingHuang commented 1 year ago

By the way, can you please tell me more information on your dataset (such as from which company, which exact MRI T1 sequence), so we can have more intuition on data drift across different scanners. Ours is from Siemens Scanners with the contrast enhanced T1 MPRAGE sequence.

damiankucharski commented 1 year ago

Thank you for the information about your acquisition procedure. Sadly I cannot provide you with detailed information due to NDA, but the data comes from multiple institutions and many different scanners. It is therefore quite probable that your model was trained on the data that comes from narrower distribution and may not generalize perfectly to our data. However, when the results are published I will definitely share them with you.

YixingHuang commented 1 year ago

Thank you for your information.

damiankucharski commented 4 months ago

@YixingHuang if you are still interested in reading it, we have published the paper, here is the link: https://www.sciencedirect.com/science/article/pii/S0895611124000788?via%3Dihub :)

YixingHuang commented 4 months ago

Thanks for sharing the information. Already got the notification of your article from ResearchGate. Nice work done!

YixingHuang / DeepMedicPlus

Poor results on my testing data #5