Closed VitorGuizilini-TRI closed 1 year ago
Hi Victor, thank you very much for your interest. The implementation of depth encoder-decoder is under this file https://github.com/duanyiqun/DiffusionDepth/blob/main/src/model/ops/depth_transform.py Basically, the encoder-decoder is trained in end2endway. As mentioned in another issue, I tried to pre-train it with the KITTI dataset itself in an unsupervised way. But I didn't observe a clear difference. I think probably use a depth latent space with more prior knowledge such as VA-depth would increase the performance (mostly in generalized ability), but I haven't tried this part.
Hi, thank you for open-sourcing the code, this paper is very interesting! Can you explain a little bit more how you trained the depth auto-encoder? Did you repurpose the original image auto-encoder from stable diffusion, or did you train your own? What is its exact architecture and training protocol (datasets, losses, parameters, etc)? It's probably in this repository somewhere, but I am having trouble finding it, so some pointers would be greatly appreciated!