Closed wwj-53 closed 5 months ago
Hi, thanks for your attention.
For binocular inputs, the depths from the EndoNeRF dataset are predicted by stereo depth esimation model STTR, thus we directly use it as the predicted value. For the SCARED dataset, we use the ground truth for simplicity. We also test with monocular inputs, where we use depth-anything model to predict relative depth maps, and the rendering performance is fine. (We use the default patameters of the depth estimation model)
If you have a binocular input sequence, you can use the STTR to predict disp values, and then convert them into the metric depth values fb/d
, where f
is the focal legnth, b
is the baseline parameter of two cameras, and d
is the predicted disp values by STTR.
For random initialization, you can directly use some rand functions to achieve this.
Besides, for the binocular setting of endonerf dataset (the common setting), I have tried to predict the depth map using STTR on my own (STTR predicting disp values, and then convert to metric depth values), and found similar performance with depth maps provided by the EndoNeRF dataset, which indicates that depth predicting is feasible.
Hello, thank you for your outstanding contribution. I have two questions: For the Gaussian initialization of the HGI module, I found that your code did not use the depth estimation model mentioned in the paper, but directly used depth, mask, etc. from the dataset. If I were to use the depth estimation model in the paper and simultaneously estimate the depth and perform 3D GS, would this result in poor real-time performance? Do you plan to disclose the parameters of your depth estimation model? In addition, if I want to disable Gaussian initialization in the HGI module and use random initialization for comparison, do you have a corresponding implementation in your code?