Closed guanjunwu closed 10 months ago
Hi, thanks for your attention.We use two datasets including ENDONERF and SCARED.
For ENDONERF, the authors use a da vinci robot to capture poses and a stereo-depth estimation network to predict depths as ground truth. More details of the dataset can be found in this repo:https://github.com/med-air/EndoNeRF. For SCARED, they also use a da vinci robot to capture poses, and a projector (RGBD) to get precise depth maps. Details of this dataset can be found here: https://endovissub2019-scared.grand-challenge.org/About/
To get gt depth related to a given camera pose, I think the most accurate way is to let the camera sensor be placed at that pose when capturing the data.
As I know, the methods to get global ground truth depth and poses are deep-learning-methods like RCVD, and RGB-D cameras ( but i also not clear about how can get precise depth and pose). However, we cannot directly infer depth from a dynamic monocular video related to given camera poses. Do the authors have any other guidance?