Inquiry about Text Input in Text-to-3D Generation

BiDiff / bidiff

[CVPR'24] Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors

Apache License 2.0

164 stars 5 forks source link

Inquiry about Text Input in Text-to-3D Generation #7

Open RobinLiuZX opened 5 months ago

RobinLiuZX commented 5 months ago

Dear Author, I am a graduate student from Mainland China, and I have a basic question regarding your paper. I noticed that the title of your paper is "Text-to-3D Generation." However, in your training framework diagram, I did not find any input or processing part related to text. Could you please clarify whether the text input plays a role only during the inference phase and not during the training phase? Thank you for your assistance.

DingLihe commented 4 months ago

Thanks for your interest in our work. The text prompt is encoded by a CLIP encoder and fed into both the diffusion processes during training and inference.