cvlab-columbia / zero123

Zero-1-to-3: Zero-shot One Image to 3D Object (ICCV 2023)
https://zero123.cs.columbia.edu/
MIT License
2.59k stars 188 forks source link

Could you please give me a brief explaination about the log images? #83

Open yumeko717 opened 11 months ago

yumeko717 commented 11 months ago

Hi author. We use main.py to train and obtain 5 log images(input, condition, reconstruction, samples, samples_cfg_scale_3.00) I know input and condition, but other 3 I dont understand. What is samples and samples_cfg_scale_3.00? How do I assess how good the training is by these two images?

ys830 commented 10 months ago

I also have the same confusion....

yanjk3 commented 9 months ago

The ''reconstruction'' is the output of the vae, which usually looks the same as the ''input'' as the vae is an autoencoder. The ''samples'' and ''samples_cfg_scale_3.00'' is the generated results under the guidance of ''condition'' and ‘’camera RT‘’, the differences between them is that the former one does not use unconditional guidance while the later one uses unconditional guidance and the guidance scale is 3.0. Ideally, the ''samples'' and ''samples_cfg_scale_3.00'' should be the same object as the ''input'', and are shown from another viewpoint different from the ''input''.

yumeko717 commented 9 months ago

The ''reconstruction'' is the output of the vae, which usually looks the same as the ''input'' as the vae is an autoencoder. The ''samples'' and ''samples_cfg_scale_3.00'' is the generated results under the guidance of ''condition'' and ‘’camera RT‘’, the differences between them is that the former one does not use unconditional guidance while the later one uses unconditional guidance and the guidance scale is 3.0. Ideally, the ''samples'' and ''samples_cfg_scale_3.00'' should be the same object as the ''input'', and are shown from another viewpoint different from the ''input''.

Thank you for your kind answer!