Closed TLB-MISS closed 1 year ago
SJC optimizes a Voxel NeRF given a text prompt. Our framework uses an NVS network (zero123) to train a Voxel NeRF with similar techniques used in SJC. In addition, the issue provides a function to convert the trained Voxel NeRF to a mesh and export it. Hope this answers your question.
Hi, thanks for sharing this cool work.
I have a question about the difference between zero123 and SJC with respect to the 3D reconstruction. It seems that the novel view images generated by the zero123 are not used during the reconstruction stage since the index is 0 in the run_zero123.py file.
The original image is only used for changing the model embedding (model.clip_emb and model.vae_emb) and all the left is quite the same as the SJC method. Since the novel view images are not used, why it is better than SJC?
Hi @sjtuzq , it's a good question. First of all, SJC is not a model for 3D reconstruction so it's not directly comparable. We did convert SJC into SJC-I which basically replaces the text-conditioned stable diffusion with an image-conditioned stable diffusion (see paper for more details). In comparison to SJC-I, we further replace the image-conditioned stable diffusion to a image-pose-conditioned novel view synthesis stable diffusion, which is zero123.
In my opinion, this is better than SJC-I in 3D reconstruction for multiple reasons. Here are two I can think of now:
@ruoshiliu
Oh, I misunderstood this issue. Thank you for kindly answering my stupid question. I have a another question. Do you have any plan to support textured mesh export?
Thanks.
It's kindly implemented by @ashawkey in Stable-Dreamfusion!
@ruoshiliu
Thanks! I'll check it!
The last question : If I try zero123 on my dataset, does transforms_train.json
significantly affect the result? Currently, I am using transforms of one of the given datasets (pikachu). Will there be any problems?
The only input required to the model in addition to an image is an elevation angle -- sorry if this is a little confusing. So probably the easiest thing to do here is find an image in nerf_wild
where the input image looks similar to your image in terms of camera elevation angle w.r.t. the object, and replace the image with yours. I think the assumed angle for pikachu is around 15 degrees.
@ruoshiliu
I got it! Thank you so much for your kind answers to so many questions!
Hi! First of all, thank you so much for revealing such a wonderful work. I've checked this issue but still have a question. In the paper, it is said that 3D reconstruction was performed with sjc, not nerf. Even in the 3D reconstruction section of the README,
run_zero123.py
using sjc is shown as an example. However, there is no part in the sjc code that brings out 3D. Can you tell me the reason?Thanks.