Not Using Real Latent Noisy Image as Input to the Diffusion Model

kennykykyky commented 2 months ago

You mentioned: "Instead of directly using the latent image z = Gθ1(y), NN employs a linear combination of the noisy image y and the latent image z as the final result after transformation."

However, the Denoiser(I0+Y) is their final target while in your paper you use (I0+Y) as the input to the diffusion model. I am very confused about not using z but using I0+Y. Could you explain the potential reason behind it? I think the aim of using VAE is to transform the noise pattern to Gaussian rather than apply denoising on the latent image. The variable you use as final output of VAE part, (I0+Y), has already been denoised by the Gaussian denoiser (BM3D in the code) in the previous step, why it can still be handled correctly by diffusion model?

I am wondering whether you try to use the latent image directly as the input to the diffusion model. My experiments showed that the performance is not good when directly using the latent image.

Besides, I am wondering usually how many steps you will take to converge one image in the VAE part.

I would appreciate it if you could help me with this and please correct me if the above description is not correct or clear.

Li-Tong-621 commented 2 months ago

Hello, I just woke up and saw this question.

1.“I0+Y” in code is a linear combination of the noisy image y and the latent image z: In code, mean = en_net(noise_im_torch), i0 = ((1/sigma2)mean.detach() + rho(i0_til_torch - Y)) / ((1/sigma2) + rho). Here "mean" is the z encoded from VAE and Y is the noisy image. Therefore, "i0" is a a linear combination of the noisy image y and the latent image z, and “I0+Y” is also a linear combination of the noisy image y and the latent image z.

The latent image should be applied for denoising. "VAE" here is a kind of noise transformation method, the encoder will transform the noisy image into a latent image, where the noise of noisy image becomes Gaussian, and the decoder will transform the latent image back to the original image, where the Gaussian noise should back to orignal real noise. Therefore, the latent image should be applied for denoising.

"(I0+Y), has already been denoised by the Gaussian denoiser (BM3D in the code) in the previous step" is because Gaussian denoiser is employ as a optimization step in ADMM.

3.We didn't directly emoloy latent image. We only employ “I0+Y”, the combination of the noisy image y and the latent image z, as the input to the diffusion model, following NN. And we didn't try to use the latent image directly as the input to the diffusion model.

4.How many steps. We didn't calculate how many steps but employ SURE as our stop criterion. I have reviewed the experimental records，different datasets requires different steps. For the CC dataset, the first images seem to have only 2 500 or 3 500 steps, while the last images may have 15 500~20 500 steps. On average, it seems to require around 4500 steps.

Actually, we believe that the idea and framework of using noise transformation are quite important, and the specific methods can be improved. The NN method takes a relatively long time and is challenging for futher improvement. Our are also working for faster self-supervised methods and new noise transformation technique.

If you find it helpful, feel free to give a star.

kennykykyky commented 2 months ago

Hello, I just woke up and saw this question.

1.“I0+Y” in code is a linear combination of the noisy image y and the latent image z: In code, mean = en_net(noise_im_torch), i0 = ((1/sigma2)mean.detach() + rho(i0_til_torch - Y)) / ((1/sigma2) + rho). Here "mean" is the z encoded from VAE and Y is the noisy image. Therefore, "i0" is a a linear combination of the noisy image y and the latent image z, and “I0+Y” is also a linear combination of the noisy image y and the latent image z.

The latent image should be applied for denoising. "VAE" here is a kind of noise transformation method, the encoder will transform the noisy image into a latent image, where the noise of noisy image becomes Gaussian, and the decoder will transform the latent image back to the original image, where the Gaussian noise should back to orignal real noise. Therefore, the latent image should be applied for denoising.

"(I0+Y), has already been denoised by the Gaussian denoiser (BM3D in the code) in the previous step" is because Gaussian denoiser is employ as a optimization step in ADMM.

3.We didn't directly emoloy latent image. We only employ “I0+Y”, the combination of the noisy image y and the latent image z, as the input to the diffusion model, following NN. And we didn't try to use the latent image directly as the input to the diffusion model.

4.How many steps. We didn't calculate how many steps but employ SURE as our stop criterion. I have reviewed the experimental records，different datasets requires different steps. For the CC dataset, the first images seem to have only 2 500 or 3 500 steps, while the last images may have 15 500~20 500 steps. On average, it seems to require around 4500 steps.

Actually, we believe that the idea and framework of using noise transformation are quite important, and the specific methods can be improved. The NN method takes a relatively long time and is challenging for futher improvement. Our are also working for faster self-supervised methods and new noise transformation technique.

If you find it helpful, feel free to give a star.

Thanks for your quick response and it is very helpful. However, I still have some questions and I would appreciate it if you could help me with these questions. 1) The NN method used i0_til_torch as their final output, which is not the linear combination of the latent image and the noisy image. 2) Y probably is not the noisy image but is the q in the ADMM algorithm in the code.

Thanks in advance!

Li-Tong-621 commented 2 months ago

1.The NN method used i0_til_torch as their final denoised output, not the latent image after noise transformation. Their noise transformation results is “I0+Y”. And they denoise I0+Y by BM3D as the final denoised output.

2.Yes, q in the ADMM algorithm is Y in the code. Y can not be directly named as noisy image. But I0+Y is combination of the latent image and the noisy image, as the noisy image is introduced in the first iteration.

i0 = ((1/sigma**2)*mean + rho*(i0_til_torch - Y)) / ((1/sigma**2) + rho)

i0_til_np = bm3d.bm3d_rgb(i0_np.transpose(1, 2, 0) + Y_np.transpose(1, 2, 0), sig).transpose(2, 0, 1)
i0_til_torch = np_to_torch(i0_til_np).to(device)

Y = Y + eta * (i0 - i0_til_torch)

i0_til_torch is initialized as noisy image, and Y is zero. The noisy image is introduced in the first iteration and Y is nearly the resdual noise between denoised image and noisy image. Therefore, I0+Y is combination of the latent image and the noisy image, which is also recognized by NN method.

kennykykyky commented 2 months ago

Ok, thanks for your explanation!

Li-Tong-621 commented 2 months ago

You are welcome, feel free to give a star.

Li-Tong-621 / DMID

Not Using Real Latent Noisy Image as Input to the Diffusion Model #6