Closed YuxuanSnow closed 3 months ago
How the model performs on other resolutions is actually unpredictable; it is interesting to see the model generates a 3x5 grid in this case. And yes we use 320x320 as the resolution for reconstruction.
Interesting!!
Dear Authors,
thanks for the great effort and open-sourcing the model.
I have a question regarding the inference resolution of the image. Basically, model diffuses one image, which is 2x3 sub images for 6 views. I see that at the inference time, resolution (640x960) is used, which means the resolution of each view is 320x320. Is that also the images you used in One-2-3-45++ to construct the feature volume?
I also tried to infer higher resolution (512x2, 512x3), and it generate following image, which has 3x5 views. Is this expected? The middle column as well as second and fourth row looks a bit as interpolated camera poses, compared to the (320x2, 320x3), which has 2x3 views:
![output_](https://github.com/SUDO-AI-3D/zero123plus/assets/50771152/7d68bc3d-872d-492e-929b-e321e9bb0e66)