Confusion about training

Cavalry-Control commented 1 year ago

Dear Julian, thank you very much for open-sourcing such a creative code, which has benefited me a lot! After studying your code carefully, I have some confusion about the training of the model. If you have time, can you help me?

I want to know how many epochs did you train on the leather dataset? After I trained 3000 epochs, I added noise with a step size of 150, but the model could not reconstruct the original input (on the training set).
I want to know if my logic is correct: we set the step size T to 1000 during training, but it is sufficient to set the step size to 150 or 250 during reconstruction and sampling. Is this the case?

I would be very grateful if you could take time out of your busy schedule to answer me. Wish you all the best!

Julian-Wyatt commented 1 year ago

This sounds fishy. 3000 epochs is WAY more than you need to get even a reasonable output, so I guess there's something else going on there. Ensure when you're outputting your images, that it is being outputted correctly (ie values in correct ranges).
if you want to speed up training and you're only sampling to T of 150 to 250 then you don't need to set your training T quite so high. If you want to reconstruct to reduce anomalies, you'd diffuse t through 0 -> t=250 -> 0. If you just want to generate images you would need to randomly sample and denoise -> further more recent papers massively improve this process. Generally unsure of your specific question here as if you train your model to diffuse 1000 total time steps where 1000 is the final timestep based on your noise schedule then 150-250 is definitely feasible for our partial diffusion strategy.

Cavalry-Control commented 1 year ago

Wow, thank you so much for the quick response! 😊😊😊😊😊 I made the following modifications to the code for outputting the training results, in order to examine the model's output at the 149 timestep:

Here are some results outputted. Could you kindly take a look and let me know if they appear to be correct?

I'm relatively new to this, so some of these questions may seem a bit naive. Please bear with me! (┬┬﹏┬┬)

Cavalry-Control commented 1 year ago

I am really happy to get your reply and help. I don’t have any classmates who are learning this thing, so I don’t know if I’m doing it right or not a lot of the time. I want to express my gratitude to you again, from the bottom of my heart.💖

Julian-Wyatt commented 1 year ago

Wow, thank you so much for the quick response! 😊😊😊😊😊 I made the following modifications to the code for outputting the training results, in order to examine the model's output at the 149 timestep:

Here are some results outputted. Could you kindly take a look and let me know if they appear to be correct?

I'm relatively new to this, so some of these questions may seem a bit naive. Please bear with me! (┬┬﹏┬┬)

They seem alright at a glance - to get a better idea it's worth adding more than 150 time steps of noise as it's still very near the input $x_0$. Further, it's worth highlighting that the bottom image here isn't the reconstructed image. It's a mathematical approximation of $x_0$ straight from $x_t$. So you're outputting $x_0$, $x_t$ and $\hat{x}_t$. But don't mistake $\hat{x}_t$ for a true diffusion-based reconstruction.

I am really happy to get your reply and help. I don’t have any classmates who are learning this thing, so I don’t know if I’m doing it right or not a lot of the time. I want to express my gratitude to you again, from the bottom of my heart.💖

Don't worry about it - everyone's gotta start somewhere. This was only my 3rd large deep learning project - for my masters - and I spent ages going over the maths at various stages to debug what was going wrong as like you my classmates were on different projects.

Cavalry-Control commented 1 year ago

I feel really lucky to have your help, it means a lot to me!😭 Can there be any way to get the diffusion-based reconstructed image(on the leather dataset)? Sincerely wish you all the best！😊

Julian-Wyatt commented 1 year ago

I used the forward-backward method in GaussianDiffusion.py to generate my reconstructions. There are some parameters for outputting each frame which you can use to make a video for logging or from starting from half way through the sequence.

Cavalry-Control commented 1 year ago

Thank you again and send my best wishes. I think I should understand the process, I may not have been able to figure out the meaning of the various variables and parameters before, and with your help I think I should be able to reproduce the process. It would be so cool if I could do such creative work as you did. Thank you, from the bottom of my heart.💖💖💖💖💖💖💖💖💖💖

jamdodot commented 1 year ago

Thank you again and send my best wishes. I think I should understand the process, I may not have been able to figure out the meaning of the various variables and parameters before, and with your help I think I should be able to reproduce the process. It would be so cool if I could do such creative work as you did. Thank you, from the bottom of my heart.💖💖💖💖💖💖💖💖💖💖

Hi I'm currently having some issues training the MvTech dataset. May I add your contact information to consult you? My email is jamdodot@whut.edu.cn

Julian-Wyatt commented 1 year ago

Hi I'm currently having some issues training the MvTech dataset. May I add your contact information to consult you? My email is jamdodot@whut.edu.cn

I have emailed you - please get back to me if you would like any advice.

Julian-Wyatt / AnoDDPM

Confusion about training #14