LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
7.01k stars 539 forks source link

Feature alignment loss #224

Closed schatto02 closed 2 months ago

schatto02 commented 3 months ago

Hi, thanks for such great work ! I was wondering which features are used for this loss --- do we use intermediate features or the final encoder features?

Also, if the student and teacher feature dimensions are different, what kind of projection is used to bring them to a compatible feature space?

LiheYoung commented 2 months ago
  1. We use the final encoder features.
  2. We use the same student-teacher structure (e.g., both ViT-Large) for alignment, so the dimensions are the same. If the dimensions are different, we recommend adding a linear projection layer on top of the student features.