Open chenqi13814529300 opened 1 week ago
Hi @chenqi13814529300, most of the feed-forward approaches focus on a few images, mainly because of framework designs and computation limitations. We are also actively exploring extending feed-forward approaches to handle a larger number of input views. Currently, the best we can achieve is up to 12 input views using Haofei's recent work DepthSplat.
We also plan to release a video diffusion-enhanced model that can handle 360-degree scene synthesis in a feed-forward manner, using only 5 widely displaced images. It just got accepted to NeurIPS 2024 and will be available online in the coming two weeks. A quick video preview of this new model (termed MVSplat360) can be found here. Stay tuned!
Thank you for your answer. Looking forward to your new work!
I have found that these studies often involve reconstructing a few images, but in reality, hundreds or thousands of images are used to reconstruct a closed scene, rather than a 180 degree perspective. May I ask if there will be support for a large number of image inputs for scene reconstruction in the future?