Closed HencyChen closed 1 year ago
Hi Chen, After we posterior experiments, using the latent loss have small improvement but may cause severe training weights crash if without pre-training. So we just maintain pixel loss and DDIM loss for public. Pixel loss is enough to supervise the latent space limited to our experiment. Best regards
Hi @duanyiqun,
Thanks for the reply. In this case, what do you mean "without pre-training"? Do you use any model as the pre-trained weight?
We use Swin transformer large 384 as backbone pre-training. But here pre-training refers to depth pre-training using the pixel loss.
We've seen swin-large as pretrained weight in the code. But what do you mean to use "depth pre-trainining" using pixel loss? Based on the description of paper, I didn't find any description about depth pre-training or I miss something?
Thanks!
It just means train this model with the current loss. Then add the latent loss after roughly training the diffusion depth model.
Got it. Thanks for the quick reply^^
Hi @erjanmx and @duanyiqun,
Thanks for the great work.
Based on the paper, you mentioned the loss function is composited by a pixel loss, a latent loss and regular DDIM loss. However, in the code I only find DDIM loss and L1/L2 loss on predicted depth and GT. I'm wondering if I misunderstanding something or I just misfinding the corresponded loss function.
Thanks again !