Comparison on public benchmarks against SOTA

feixh commented 1 year ago

Is your feature request related to a problem? Please describe.

I've been playing with SDFStudio, truly great work, but I found the performance (say, chamfer distance) of each individual method lags behind what has been reported in their corresponding paper. For instance, on the DTU 65 dataset, the original VolSDF reports a chamfer distance of 0.7, but the vanilla VolSDF model in SDFStudio produced a chamfer distance of about 1.0 on the same dataset. And other methods are very much similar, e.g., the NeuS-facto method has a chamfer distance of about 1.0, etc.

Describe the solution you'd like

Could you provide configurations/instructions to reproduce the SOTA? Could you also provide any benchmark you have done of SDFStudio on public benchmarks, say, DTU?

Describe alternatives you've considered

A systematic document on how to tune the hyperparameters, what components to use to achieve good performance as reported in the literature will be very useful.

Additional context Add any other context or screenshots about the feature request here.

niujinshuchong commented 1 year ago

Hi, I think the visual quality is very similar to the original paper but I also actually didn't run a detailed benchmark evaluation. How do you evaluate the results? Do you use mask to filter out some mesh surface when evaluating?

feixh commented 1 year ago

I used the evaluation code from the monosdf repo, they have a eval_dtu.py which evaluates the generated mesh against the GT. And yes, I culled the meshes using the object masks.

niujinshuchong commented 1 year ago

Hi, we use low resolution images (384x384) in DTU dataset by default while number in papers use high resolution images (1200x1600). I simply tested with NeuS with multi-res grids (60K iterations) on the DTU dataset scan65 get the results 0.78 (low-res) vs 0.71 (high-res). I also tested NeuS and VolSDF ( MLP 100K iterations) on the DTU dataset scan65 with high resolution images and get the results: 0.76 (NeuS) and 0.75 (VolSDF). I think the results is close the paper and within the training variances ( the GT point clouds in DTU dataset is not perfect and contains wrong points).

feixh commented 1 year ago

Cool, knowing your evaluation results is definitely helping. I have a couple of remaining questions. (1) Which library/scripts are you using for the evaluation? (2) And you extracted the mesh using the script provided by SDFStudio, right? I noticed each code base provides slightly different mesh extraction codes. (3) Are you using the default configuration to train the model?

niujinshuchong commented 1 year ago

Hi,

(1) I use the evaluation script from MonoSDF. (2) I use ns-extract-mesh provided by SDFStudio but it is the same as MonoSDF. The mesh extraction scrip is different to other papers because we use sliding windows (if resolution > 512) and multi-scale extraction (similar to occupancy network but use sdf here). I don't think this will make large difference to the results. (3) There is no default configuration in SDFStudio because we should use different configurations for indoor scenes and the DTU dataset (e.g. --pipeline.model.sdf-field.inside-outside, --pipeline.model.sdf-field.bias or --pipeline.model.sdf-field.use-grid-feature or --pipeline.model.sdf-field.beta-init). I use the this command to train the model where the HIGHRES_DATA can be found in MonoSDF repo.

ns-train volsdf --pipeline.model.sdf-field.beta-init 0.1 --pipeline.model.sdf-field.bias 0.5 --pipeline.model.sdf-field.inside-outside False --pipeline.model.sdf-field.use-grid-feature False --pipeline.model.sdf-field.hidden-dim 256 --pipeline.model.sdf-field.num-layers 8 --pipeline.model.sdf-field.num-layers-color 2 --pipeline.model.background-model none --pipeline.model.near-plane 0.5 --pipeline.model.far-plane 4.5 --pipeline.model.overwrite-near-far-plane True --experiment-name volsdf-mlp-dtu65 monosdf-data --data HIGHRES_DATA/scan65 --center-crop-type no_crop

feixh commented 1 year ago

Awesome! Thanks so much!

nlml commented 1 year ago

Hey @feixh and @niujinshuchong

I also want to do evaluations on the DTU dataset and came across this issue. I really can't figure out what is the "standard" way to evaluate on this dataset. For instance, I want to reproduce the metrics in the neuralangelo paper, but it's really unclear what is the procedure they used to evaluate.

Could either of you explain to me how exactly we are meant to calculate PSNR and Chamfer distance on DTU scenes to compare with other methods?

Specifically:

For PSNR, are there specific validation/test images that we are meant to hold out during training? Also, are we meant to train and eval at the full resolution (1200x1600)?
For Chamfer distance, how do we calculate this? Am I supposed to extract meshes and then run something like the code here ? This seems like it will work poorly for SDF methods without removing the foreground first?

Any info either of you could share here would be really appreciated! Thanks!

niujinshuchong commented 1 year ago

@nlml For PSNR, I am not sure how the data is split but we should evaluate on the test images. For Chamfer distance, you could refer to https://github.com/autonomousvision/monosdf/tree/main/dtu_eval where the extracted mesh is filtered by object mask before the calling the eval_dtu script.

TriptSharma commented 1 year ago

Hi, I tested the benchmarking script (used ns-export tsdf command

Hey @feixh and @niujinshuchong

I also want to do evaluations on the DTU dataset and came across this issue. I really can't figure out what is the "standard" way to evaluate on this dataset. For instance, I want to reproduce the metrics in the neuralangelo paper, but it's really unclear what is the procedure they used to evaluate.

Could either of you explain to me how exactly we are meant to calculate PSNR and Chamfer distance on DTU scenes to compare with other methods?

Specifically:
1. For PSNR, are there specific validation/test images that we are meant to hold out during training? Also, are we meant to train and eval at the full resolution (1200x1600)?

2. For Chamfer distance, how do we calculate this? Am I supposed to extract meshes and then run something like the code [here](https://github.com/jzhangbs/DTUeval-python) ? This seems like it will work poorly for SDF methods without removing the foreground first?
Any info either of you could share here would be really appreciated! Thanks!

Hi, I used the script mentioned in the link. However, I am getting a mean chamfer distance of 6.2 while the PSNR values are comparable. So, I was wondering what the issue might be, the reconstructed mesh does look decent. Is there a metric conversion or something I am missing. I used the ns-export tsdf command to generate the mesh.

iszihan commented 1 month ago

I'm also getting a really big chamfer distance. It looks like the extracted mesh is offset a bit from the ground truth point cloud. Has anyone solved this problem? @TriptSharma @niujinshuchong

autonomousvision / sdfstudio

Comparison on public benchmarks against SOTA #14