Open command-z-z opened 2 months ago
I want to query this as well, because I did the fine-tune with my sparse gt labels yet the results were horrible, not sure if there was something important that I might miss. I have tried to modify the lr of pre-trained model from 5e-06 to 1e-06, btw don't worry about my dataset size because I tried from 10,000 to 100,000 different amount, but still hard to lower the loss.(I supposed that it might had something to do with my sparse labels? I simply set the valid_mask as gt > 0 to ignore the pixels without depth values, but these pixels where gt > 0 occupied perhaps 10-20% perhaps in a picture, I kinda worried that it might be too sparse.)
The answer is in the ZoeDepth. They train heads from scratch: "Our model first learns from a large variety of datasets in pre-training which leads to good generalization. In the second stage, we add heads for metric depth estimation to the encoder-decoder architecture and fine-tune them on metric depth datasets".
The answer is in the ZoeDepth. They train heads from scratch: "Our model first learns from a large variety of datasets in pre-training which leads to good generalization. In the second stage, we add heads for metric depth estimation to the encoder-decoder architecture and fine-tune them on metric depth datasets".
So, Does it just load encoder pre-trained weights and random initialize head network for fine tune?
Thank you for the awesome project! I have a question, When you fine-tune the metric depth model, do you load the pre-trained depth anything v2 DINOv2 backbone and randomly initialize the DPT head(or other initialization?), or do you do any other special processing?