cc-ai / climategan

Code and pre-trained model for the algorithm generating visualisations of 3 climate change related events: floods, wildfires and smog.
https://thisclimatedoesnotexist.com
GNU General Public License v3.0
72 stars 18 forks source link

Improve depth learning + evaluation #133

Closed vict0rsch closed 3 years ago

vict0rsch commented 3 years ago

https://arxiv.org/pdf/2003.06620.pdf

lets use this issue's comments to debrief: interesting stuff, questions etc.

melisandeteng commented 3 years ago

What we can keep in mind from some of the models presented in this overview: The authors explicitly discourage use of the depth prediction model on higher resolution than the one is was trained on. Also, it seems a lot of them just take NYU depth dataset, or KITTI dataset to benchmark on, but in our case it's very important to have a model that's robust across different types of images. Hence, we'll focus on this model MiDaS for the moment

melisandeteng commented 3 years ago

We implemented a loss with gradient matching (GM) + scale invariant MSE terms inspired by MiDaS paper. Actually MiDaS uses 4 scale levels for scale invariant GM, halving the image resolution at each level. Maybe I should implement that @vict0rsch ? The thing is that the original gt depth maps were computed on resized input image and then upscaled. I wonder if that can have any impact on a multi scale GM loss.

just to keep this reference somewhere: DADA uses the reverse Huber loss

melisandeteng commented 3 years ago

architectures : Megadepth : Hourglass network from this paper - composed of modified inception modules MiDaS : resnet-based architecture from this paper DADA: residual auxiliary block - encoded features (before the depth pooling) are decoded by a convolutional layer and fused with the backbone features.