Open zaoruis opened 1 month ago
There is only one minor difference: in V1, we unintentionally used features from the last four layers of DINOv2 for decoding. In V2, we use intermediate features instead. This modification originates from this issue, but we find it indeed does not affect the model performance in our task.
Is there any difference in model structure between Depthanyst-V2 and the original DepthAnything? In downstream tasks, can it be used as two contrast methods?