Open legenda971 opened 1 month ago
Depth Anything is one of the SOTA works in that end (I haven't tried it for combining with FoundationPose tho)
I was also thinking about using the Depth Anything for this purpose. But as far as I understood the FoundationPose paper, you would want to have absolute depth values for your RGB-D input, right? Depth Anything would only give you relative depth values which you would need to somehow convert to absolute depth.
Is it possible to use a predictive model for generating depth maps? If so, what are some recommended approaches or models that can effectively predict depth information from images or video sequences?