There is a large additive bias in the metric depth outputs

Hello,

I have been experimenting with the metric depth models that were fine-tuned with the Hypersim dataset.

I am running on data coming from an Azure Kinect. I have shifted true depth to be from the color camera persepective and undistorted both depth and RGB to have Hypersim intrinsics (intWidth, intHeight, fltFocal = 1024, 768, 886.81). I am comparing true depth to estimated metric depth and I notice that the closer the object, the worse its relative depth.

In the picture below, white indicate correct depth, red indicates that the estimate is too large, and blue indicates that it is too small.

One way that this can happen is if there is a bias in the estimated depth. Indeed, here is a histogram of the difference true depth - estimated depth

The median difference was -866mm. When I subtract 866mm from the estimated depth, I get this:

amanda2

Furthermore, this constant difference continues to work as I point the camera at different scenes at different distances.

I cannot find the source of this bias in the training or inference code. I thought I would let you know in case you have any insight.

DepthAnything / Depth-Anything-V2

There is a large additive bias in the metric depth outputs #127