Closed Tranquillar closed 2 years ago
Hi Marcel,
This is weird. I tested this script many times and it works well with the preprocessed OASIS dataset in https://github.com/adalca/medical-datasets/blob/master/neurite-oasis.md.
I notice that your Dice score at iteration 0 is different from mine. Where did you download your OASIS dataset? Could you print out the training and validation files you used in Train_sym_neurite_oasis.py? (Lines 87 and 148-152)
Regards, Tony
Hi Tony,
thank you for the quick reply. I downloaded the dataset from the link https://github.com/adalca/medical-datasets/blob/master/neurite-oasis.md. There seems to be only one version with 3D images ("neurite-oasis.v1.0").
Output of training files: (Line 87) names = sorted(glob.glob(datapath + '/OASISOAS1*_MR1/aligned_norm.nii.gz'))[0:255]
...
Output of validation files:
(Line 149) fixed_img = sorted(glob.glob(datapath + '/OASISOAS1_MR1/aligned_norm.nii.gz'))[255] (Line 150) fixed_label = sorted(glob.glob(datapath + '/OASISOAS1_MR1/aligned_seg35.nii.gz'))[255] (Line 151) imgs = sorted(glob.glob(datapath + '/OASISOAS1_MR1/aligned_norm.nii.gz'))[256:261] (Line 152) labels = sorted(glob.glob(datapath + '/OASISOAS1_MR1/aligned_seg35.nii.gz'))[256:261]
You also mentioned the difference in dice score in iteration 0. Isn't that somewhat expected because the initialization of weights in the network is random? (Correct me if I'm wrong here. ) I get different dice scores in iteration 0 every time I start the script. The values are usually between 0.4 and 0.6.
Thank you for your help. Best regards Marcel
Quick additional information:
This warning comes up every time when I run the scripts for training or inference. Does this have any significance to the model?
You also mentioned the difference in dice score in iteration 0. Isn't that somewhat expected because the initialization of weights in the network is random? (Correct me if I'm wrong here. )
Yes, you are correct. I just want to make sure we are using the same data source.
This warning comes up every time when I run the scripts for training or inference.
This is a minor user warning and will not make a big difference in the result.
I cannot reproduce your result. Could you send me the "Functions.py", "Models.py" and "Train_sym_neurite_oasis.py" you used to see whether I can reproduce the same error?
Here you go.
Hi @Tranquillar ,
I tried the code you provided, and it seems there is no problem at all.
Here is the log using your code: Validation Dice log for SYMNet_neurite_oasis: 0:0.5447214704388855 1000:0.6589802244014441 2000:0.6850653705911428
Here is the log using the source code on Github: Validation Dice log for SYMNet_neurite_oasis: 0:0.5297590205219744 1000:0.6597511465193313
Yet, I spotted two discrepancies between your code and the original one.
Left: your modified code (Train_sym_neurite_oasis.py) Right: the original code (Train_sym_neurite_oasis.py)
Since we use "tab" for indentation, the extra spaces in line 133 may cause an issue for some python compilers.
Try to remove the extra spaces as mentioned above, or re-download the training script and try again.
If the problem persists, I think it would be a non-trivial environmental problem and you may try it with another machine, if possible.
Thank you very much for looking into it. I will try it on a different machine.
Do you mind telling me which cuda and pytorch versions you are using? I want to make really sure that there are as little differences as possible.
Thanks in advance 😊
Sure. I am using Ubuntu 16.04 LTS + Pytorch 1.9.0+cu111. The code is tested with an NVIDIA RTX 3080 GPU (Driver version: 460.84, CUDA Version: 11.2).
Hi Tony, the situation is resolved. I ran the code in a docker container with the exact pytorch and cuda version you used.
Ubuntu 16.04 LTS + Pytorch 1.9.0+cu111.
Everything looks a lot better now. This is the validation score log after 55k iterations: 0 : 0.4097486243230383 5000 : 0.7047512729914296 10000 : 0.7209373161013326 15000 : 0.7476094422734847 20000 : 0.7573324792065288 25000 : 0.76341926686527 30000 : 0.7683853719520911 35000 : 0.7717567312637159 40000 : 0.7773653384358569 45000 : 0.7856832570252698 50000 : 0.7811629084686469 55000 : 0.783746416889304
Thanks again for your help. 👍 I will now start training the network with my own data.
Quick summary of the issue and solution in case anyone else experiences this:
My Setup
Symptoms:
Solution:
Good to hear that. Thanks for your summary. 👍
If you are looking for a state-of-art registration method, you may also check out our latest image registration framework for medical images at https://github.com/cwmok/Conditional_LapIRN.
Hi,
I really enjoyed reading your paper and I want to reproduce the results. I am currently trying to train the model with the OASIS data set using the example file you provided ("Train_sym_neurite_oasis.py").
However I am hindered by two major problems:
Problem 1:
The validation scores are not even close to your results. Also the value doesn't seem to change at all, which is really confusing to me.
My validation scores:
0:0.554038160843976 5000:0.585584165076553 10000:0.585584165076553 15000:0.585584165076553 20000:0.585584165076553 25000:0.585584165076553 30000:0.585584165076553 35000:0.585584165076553
Your validation scores:
0:0.570166888906167 5000:0.7349817233331957 10000:0.7674420857250869 15000:0.7831680992948633 20000:0.7861187128159433 25000:0.7941986866278279 30000:0.792603563494231 35000:0.7965674164395773
Problem 2:
At some point the overall loss becomes negative infinity and then NaN.
Afterwards the next validation will always cause an division-by-zero-error.
This seems to happen every time. I am using the default parameters of the file.
Do you have any idea how to solve this problems? I would really appreciate your help on this issues.
Best regards Marcel