Open adizhol-str opened 2 months ago
I also have a question about this, why is it not a linear regression layer?
I also have a question about this, why is it not a linear regression layer?
A relu or sigmoid should be there to keep the output positive (for either depth or disparity)
Hi @adizhol-str and @ZYX-MLer, in metric depth estimation, it's common practice to use a sigmoid function to map the output to the 0-1 range. This is because the metric depth values fall within the range of 0 to max_depth
. We use a sigmoid function to map the output to 0-1, then multiply it by a pre-defined max_depth
. However, for relative depth estimation, inverse depth values can range from 0 to infinity, so we use a ReLU function instead.
@LiheYoung Thank you for clarifying. It is confusing since it is mentioned in the DepthAnything paper as well as in MiDaS, that the inverse depth is scaled between [0...1] for the relative depth training.
I want to ask a question about the following text:
"Concretely, the depth value is first transformed into the disparity space by d=1/t and then normalized to 0∼1 on each depth map. To enable multi-dataset joint training, we adopt the affine-invariant loss to ignore the unknown scale and shift of each sample"
Are the outputs from the relative depth map models d=1/t? That is, they are normalized to 0~1 for just computation of the loss function.
https://github.com/DepthAnything/Depth-Anything-V2/blob/31dc97708961675ce6b3a8d8ffa729170a4aa273/metric_depth/depth_anything_v2/dpt.py#L113
Sigmoid() layer is being used for the metric depth architecture. This doesn't make sense to me. Can you explain? (the relative depth architecture which should return inverse depth in0 [0...1] doesn't have a sigmoid in the output)
Thank you