Question about self-diffusion and depth encoder

Hi, thanks for your excellent work. I have a few questions. I didn't figure out what is self-diffusion. Is the refined depth map you mentioned in the paper a true depth map or a latent map? Is it dense or sparse? Did you mean that the model adds noise to the latent map and then denoises it, using DDIM loss to train the denoising part? So how can I get the initial latent map? Is it initialized as Gaussian noise？Or should I get the refined depth map from the output of depth encoder? Except for DDIM loss, you only adopt l1 and l2 loss on the predicted depth map and the gt depth map with a mask. If so, how can I train the depth encoder? Besides, I didn't understand the role of the sparse gt latent code.

duanyiqun / DiffusionDepth

Question about self-diffusion and depth encoder #37