ge-xing / Diff-UNet

Diff-UNet: A Diffusion Embedded Network for Volumetric Segmentation. (using diffusion for 3D medical image segmentation)
Apache License 2.0
149 stars 21 forks source link

About The Testing #5

Open YonghanLU opened 1 year ago

YonghanLU commented 1 year ago

From your figure 1 in the paper, we know your method directly predict x_0 in training, but why you inference the result step by step in testing? I do not understand.

920232796 commented 1 year ago

You can learn how the diffusion model works. It needs one step in training and multiple steps in testing.

YonghanLU commented 1 year ago

I think I know diffusion a little, but I still do not understand , If you predict x_0 directly during training , how do you perform multi-step reasoning in testing ? Where are the parameters for Gaussian noise ? Did you still predict the noise, but did not start with independent Gaussian noise, and predicted the segmentation map from step t(then , predict x_0 step by step)?

920232796 commented 1 year ago

In this task, Diff-UNet starts with independent Gaussian noise, but predicts segmentation map(x_0) instead of noise. You can see the DDIM update formula in this repo.

YonghanLU commented 1 year ago

Why predict x_0 directlty ? Reducing memory consumption to accommdate 3D data?

Fivethousand5k commented 1 year ago

I am also confused by this part. And I don‘t think it should be considered a standard setting for diffusion models.

Why predict x_0 directlty ? Reducing memory consumption to accommdate 3D data?

Fivethousand5k commented 1 year ago

Based on my experiments, during the inference stage, the trained DiffUnets could yield satisfying results during the beginning steps (even the first step). However, such a level of noise could only occur when t is relatively small (beginning steps of forward diffusion: x0->x1->x2...) during the training stage. It means there is a gap of noise level between training and inference.

Moreover, since DiffUnet is always optimized towards x0 rather than noise, I am not sure whether it could still be considered a diffusion model. Maybe It is more appropriate to categorize it as a kind of recurrent models?

Anyway, I have no offense to your work and just wanna share some of my ideas ^-^, and wish you good luck on your submissions.

920232796 commented 1 year ago

I am also confused by this part. And I don‘t think it should be considered a standard setting for diffusion models.

Why predict x_0 directlty ? Reducing memory consumption to accommdate 3D data?

You can see this article from Hinton. A Generalist Framework for Panoptic Segmentation of Images and Videos.

I think the target of segmentation task is simple (the model only predicts 0, 1, 2...., not continuous data), so it can predict x_0 directly.

920232796 commented 1 year ago

Based on my experiments, during the inference stage, the trained DiffUnets could yield satisfying results during the beginning steps (even the first step). However, such a level of noise could only occur when t is relatively small (beginning steps of forward diffusion: x0->x1->x2...) during the training stage. It means there is a gap of noise level between training and inference.

Moreover, since DiffUnet is always optimized towards x0 rather than noise, I am not sure whether it could still be considered a diffusion model. Maybe It is more appropriate to categorize it as a kind of recurrent models?

Anyway, I have no offense to your work and just wanna share some of my ideas ^-^, and wish you good luck on your submissions.

My work is also based on other excellent work, we can discuss more about diffusion model if you are interested in this section.