Dear Author,
I am a graduate student from Mainland China, and I have a basic question regarding your paper. I noticed that the title of your paper is "Text-to-3D Generation." However, in your training framework diagram, I did not find any input or processing part related to text. Could you please clarify whether the text input plays a role only during the inference phase and not during the training phase?
Thank you for your assistance.
Thanks for your interest in our work. The text prompt is encoded by a CLIP encoder and fed into both the diffusion processes during training and inference.
Dear Author, I am a graduate student from Mainland China, and I have a basic question regarding your paper. I noticed that the title of your paper is "Text-to-3D Generation." However, in your training framework diagram, I did not find any input or processing part related to text. Could you please clarify whether the text input plays a role only during the inference phase and not during the training phase? Thank you for your assistance.