different implementation during testing?

jessie-chen99 commented 1 year ago

Thanks for your great work but I have met a little problem:

Why is the implementation in this line different from the formula in the paper? https://github.com/ge-xing/Diff-UNet/blob/26990018c52b60a57a1ee8ebfb3e807897af1a1a/BraTS2020/test.py#L93

Should I change sample_outputs[i]["all_samples"][index].cpu() to uncer_out in https://github.com/ge-xing/Diff-UNet/blob/26990018c52b60a57a1ee8ebfb3e807897af1a1a/BraTS2020/test.py#L87

920232796 commented 1 year ago

No.

As the figure shown, the sum of "sample_outputs[i]["all_samples"][index].cpu()" is equivalent to the \bar{p_i}.

jessie-chen99 commented 1 year ago

No.

As the figure shown, the sum of "sample_outputs[i]["all_samples"][index].cpu()" is equivalent to the \bar{p_i}.

😊Thanks a lot for your reply! But I am still confused about this part.

1.

As your code shows, the \bar{p_i} in formula Y = ∑ w_{i} × \bar{p_i} is implemented as "sample_outputs[i]["all_samples"][index].cpu()" instead of "the sum of sample_outputs[i]["all_samples"][index].cpu()" you just said.

According to these nested for-loops, I reckon the formula of Y = ∑ w_{i} × \bar{p_i} may be rewritten as below, which is different from the paper.

So I am confused about which version is the correct version.

In your implementation, ["all_model_outputs"][index] is x_0 (also called x_start) without function process_xstart(), ie. the output of diffusion Unet model, while the ["all_samples"][index] is x_0 after process_xstart(). They have different value ranges. ["all_samples"][index] range from [-1,1] but ["all_model_outputs"][index] range from [-17,18] for example. why need to use both ["all_model_outputs"] and ["all_samples"]?

Hope you can help me, please.

920232796 commented 1 year ago

Uncertainty can not be calculated by the ["all_samples"], because it has been limited to [-1, 1].

jessie-chen99 commented 1 year ago

Uncertainty can not be calculated by the ["all_samples"], because it has been limited to [-1, 1].

Thank you for your explanation, I understand now!

As for my first question, yesterday I tested both your official code version (in the first row) and your official paper version (in the second row, I just changed return sample_return to return sample_return/uncer_step).

btw, the training settings are the same:

dataset: BraTS2020 (5-fold)
input_size==96
batch_size per gpu==2
num_of _gpu==4
traing_epochs==300

It seems that the dice score in the second row seems better. It seems that this change may be helpful. Perhaps you could consider making this change as well, but it's all up to you.

If I made any mistake in the above analysis, please let me know. Thanks again!

920232796 commented 1 year ago

Wow, thank you. I will modify this section.

gary-wang55 commented 1 year ago

Uncertainty can not be calculated by the ["all_samples"], because it has been limited to [-1, 1].

Thank you for your explanation, I understand now!

As for my first question, yesterday I tested both your official code version (in the first row) and your official paper version (in the second row, I just changed return sample_return to return sample_return/uncer_step).

btw, the training settings are the same:

dataset: BraTS2020 (5-fold)

input_size==96

batch_size per gpu==2

num_of _gpu==4

traing_epochs==300

It seems that the dice score in the second row seems better. It seems that this change may be helpful. Perhaps you could consider making this change as well, but it's all up to you.

If I made any mistake in the above analysis, please let me know. Thanks again!

Hi Jessie, Thanks for your valuable comments. I also run the training code on BraTS 2020 dataset, but my training results are quite strange which are: wt is 0.8498128652572632 tc is 0.4872548282146454 et is 0.41504526138305664 mean_dice is 0.5840376615524292

This is the final result showing in the log after 300 epochs (validation results), and my settings are: env = "DDP" max_epoch = 300 batch_size = 2 num_gpus = 4 GPU type: A100

I think my settings are similar to yours since I do not change anything and keep it as default. Thus I suspect the reason could be the different versions of packages. Could you please show your package version here if you don't mind? Here is my package version: Python 3.8.10 monai 1.1.0 numpy 1.22.2 SimpleITK 2.2.1 torch 1.13.0a0+936e930 My results are also similar with https://github.com/ge-xing/Diff-UNet/issues/18#issue-1739893968

ge-xing / Diff-UNet

different implementation during testing? #13