I couldn't achieve the results mentioned in the paper

0102zyx commented 8 months ago

Hello, I have trained the forward diffusion model by "python modality_cyclic_train.py --input flair --trans t1 --model_name diffusion " and backward model by "python modality_cyclic_train.py --input t1 --trans flair --model_name unet ". And then I sampling by "python modality_cyclic_sample.py --experiment_name_forward diffusion_brats_flair_t1 --experiment_name_backward unet_brats_t1_flair --model_name diffusion --use_ddim True", but I couldn't achieve the results mentioned in the paper.Could you please help me identify where I might have gone wrong? Below are some experimental data:

Logging to /home/user/data3/zyx/anomaly_detection/MMCCD/model_save/diffusion_brats_flair_t1/score_train creating loader... creating model and diffusion... sampling... all the confidence maps from the testing set saved... finding the best threshold... computing the matrixs... dice: 0.5488736795312843, auc: [0.93940544], jaccard: [0.40483961], assd: [7.20983738], sensitivity: [0.59028433], precision: [0.55902216]

ZiyunLiang commented 8 months ago

Hi, thanks for your interest in our work. May I ask: 1) how you preprocessed the data 2) what result you got from 'Cyclic UNet' (both forward and backward model is UNet)? So I can locate the problem better. Thanks!

0102zyx commented 8 months ago

Thanks for your reply!

I preprocessed the data by executing "python ./datasets/brats_preprocess.py " and just change the parameter "--data_dir" to my path of data.
I ran four experiments and the results are as follows 1.1 forward: flair_t1(diffusion)、backward: t1_flair(Unet) dice: 0.5488736795312843, auc: [0.93940544], jaccard: [0.40483961], assd: [7.20983738], sensitivity: [0.59028433], precision: [0.55902216] 1.2 forward: flair_t2(diffusion)、backward: t2_flair(Unet) dice: 0.5714830977912472, auc: [0.94247438], jaccard: [0.42823727], assd: [6.77022021], sensitivity: [0.60666867], precision: [0.58230288] 1.3 forward: flair_t1(Unet)、backward: t1_flair(Unet) dice: 0.5975252619532105, auc: [0.9412135], jaccard: [0.46528894], assd: [6.20557542], sensitivity: [0.58814729], precision: [0.68549571] 1.4 forward: flair_t2(Unet)、backward: t2_flair(Unet) dice: 0.5747254194508612, auc: [0.93895176], jaccard: [0.43841998], assd: [6.69973454], sensitivity: [0.58506951], precision: [0.62794077]

ZiyunLiang commented 7 months ago

Hi, thanks for sharing the results. I just re-run the code again on my end and I didn't encounter this problem. If you are using BraTS2021 dataset with the preprocessing outlined in the paper, the code should be able to reproduce the result. I suspect that your problem might be caused by the saving file name. Since you are running the code on two cyclic diffusion tasks (flair-t2-flair and flair--t1-flair), and in the code, the intermediate results for each run were saved with the same file name, potentially causing a mix-up of results between the two runs. Now that I have updated the code and changed the file name to be different each time, you can try running it again and this should solve the problem. If you have any further issues or have additional questions, feel free to let me know. My email is ziyun.liang@eng.ox.ac.uk. Thanks!

0102zyx commented 7 months ago

Thank you for your kind answer! I will try running it again. Thanks!

LeeHaoRanRan commented 6 months ago

Hi，I also encountered similar problems。 1、forward: flair_t2(diffusion)、backward: t2_flair(Unet) dice: 0.5705, auc: [0.9422], jaccard: [0.4274], assd: [6.9399], sensitivity: [0.6158], precision: [0.5753] 2、forward: flair_t2(Unet)、backward: t2_flair(Unet) dice: 0.5989, auc: [0.9444], jaccard: [0.4647], assd: [6.1983], sensitivity: [0.5929], precision: [0.6669] Adopting a diffusion model actually reduces the Dice value.

Kamnitsask commented 6 months ago

Hello and thank you very much for your interest in our work!

I had a look and I thought I 'd share a thought in case it can help explain your observation.

Training of all deep learning models tends to have significant variance of results between RNG seeds. And since this method has 2 models per experiment to train, variance is expected.

It is very encouraging to see the results from the first core part of our method reproduced by different users (using translation as the basis for anomaly detection, via cyclic Unet). As you ll see, even this simple method has variance, as. E.g. Flair -> T2 -> Flair with Unet we reported 58% DSC as a representative number, 0102zyx reports a 57.5%, user LeeHao got 60% in one seed. Similar for Flair->T1->Flair.

For the 2nd part of our work, the diffusion part, as there is even more stochasticity (training with sampled masks, etc), variance is also expected. I'd say before you get alarmed on whether there is a mistake on how you ran the code, or the actual implementation, perhaps try rerunning the code exactly in the same way but different RNG seed (both for forward and backward) and check whether/how results change? I would not be surprised if your observation is just a matter of combination of a "lucky" seed for our 1st model (cyclic-unet) and an "unlucky" seed for our 2nd model (masked diff). Unfortunately for deep learning methods, multiple experiments are often needed for finding the representative trend. Note that you will not find users reporting positive reproduction in the "issues" tab, e.g. experiments where seeds led to the expected/reported results, instead we may only observe the "unlucky" seeds here and get alarmed that there is an issue with implementation when it may be just observing unlucky seeds. Let us know if you rerun the experiment and how things change. On our end we did not find an issue reproducing it, so it will be very valuable to collect some more info from users like yourself, if you manage to find time a rerun an experiment. It will help us a lot!

(Ziyun may be more appropriate in providing detailed guidance for double checking that commands/code was run correctly for your experiments, I sadly cannot comment on that. I just thought I'd share a thought in case it helps!)

Thanks a lot for the feedback, and sorry that I can only provide such high-level thoughts. I hope they are somewhat helpful! Konstantinos Kamnitsas

LeeHaoRanRan commented 6 months ago

OK, I'll try to run it again, thank you for your answer

ZiyunLiang / MMCCD

I couldn't achieve the results mentioned in the paper #3