Closed bryan-dailabs closed 1 month ago
The camera intrinsics used in Hypersim are documented extensively in our README
.
Oh goodness, I'm not sure how I even missed that. Thanks, and apologies for the noise.
Seems like FOV is hard-coded to pi/3, to match DIODE dataset, if someone else is looking for this: https://github.com/apple/ml-hypersim/blob/20f398f4387aeca73175494d6a2568f37f372150/code/python/tools/scene_generate_camera_trajectories_random_walk.py#L79
Indeed, but for any future readers that might find this thread, the exact camera intrinsics can vary slightly per scene. See here for details.
Is this dataset created with a fixed, or variable focal length? If fixed, what is the value or values?
I am using some models trained on this dataset (Depth Anything V2) and the metric depth estimations are not matching real world measurements. One of the issues may be that our camera has a different focal length / depth of field than the Hypersim data.
It seems that if the camera intrinsics from prediction don't match the dataset that the AI will not be able to predict accurate metric depth, but perhaps my reasoning is off.
Thanks for any pointers!