Closed Crane-YU closed 1 year ago
Hi @Crane-YU , thank you for your interest in our work. The results presented in the paper and the project page were generated in an end-to-end manner with a fixed seed without manually selecting images. Adding a prefix like "a front view of" to the user prompt or using unCLIP-based diffusion models like Karlo mitigates this issue. Additionally, in Point-E, the condition image is roughly aligned because it takes the CLIP feature of the image, which is different from point cloud reconstruction models like MCC.
In the Gradio demo, we implemented a step-by-step process that allows users to confirm the desired shape of the point cloud before generating its 3D output, in order to emphasize the controllability aspect.
Thank you for your reply.
Hi @j0seo, Nice repo and thanks for sharing your code. Just one question related to the semantic code sampling. As shown in the pipeline, the generated image in semantic code sampling is directly used for the coarse 3D point cloud generation. What if the generated object is shadowed or incomplete (e.g., only contain the upper body in the generated image). Do you have to manually pick the images for concept learning and the coarse 3D generation?