donydchen / mvsplat

🌊 [ECCV'24 Oral] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
https://donydchen.github.io/mvsplat
MIT License
750 stars 35 forks source link

about test on co3d with pre-trained model #33

Open boxuLibrary opened 4 months ago

boxuLibrary commented 4 months ago

Hi, I tested the pre-trained model on the co3d dataset. However, the results seem very bad. I checked 1: the intrinsic and extrinsic parameters of the input with the epipolar model. 2 I checked the reshaped images for 256 * 256. 3: I adjusted the depth of the near and far carefully. I wonder if is it because of the generalizability of the pre-trained model? Thank you so much. 235_24641_51707_05 000017_render

boxuLibrary commented 4 months ago

More rendered pic can be found: 000001_render 000013_render

donydchen commented 4 months ago

Hi @boxuLibrary, I think it's probably because the baseline between the input views is too wide. When training on RE10K, we assume there are enough overlaps between the input source views, which we achieve by constraining the frame distance.

Below is a typical example of the overlap between the RE10K input views. The second column shows the regions that overlap with the other input view. image

You can test on two other input views with larger overlaps, similar to the one we chose for the DTU testing. That should work better.

You can also visualize the overlaps between the input by using the code snippets from the pixelSplat project at here.

Still, these are interesting findings, although I do expect that the released model might not perform well on object-centric scenes since it is trained on RE10K. I might also find time to look into the CO3D dataset, but no guarantee...