Closed YueWuHKUST closed 1 year ago
Hi Yue, thank you for your interests.
For finetuning the Point-E model, we only finetune the stage 1 diffusion model (text -> 1024 x 6, base40M-textvec). You can refer to https://github.com/crockwell/Cap3D/blob/3025a085abc19fe7532ae8d8d34e6689fa8b3847/text-to-3D/finetune_pointE.py#L119-L124 During inference, we use a fine-tuned first-stage diffusion model and a pre-trained second-stage upsampling model (1024 -> 4096).
For converting 16,384 points into 1,024 points, in paper, we randomly sample 1024 points out of 16384 before training (the sampled 1024 points are fixed during training): [:,torch.randperm(16384)[:1024]]
. A better way is to perform farthest-point-sampling over 16384 points.
Thanks for your quick reply! I understand your solution.
Thanks for your wonderful work! I notice that you finetune the point-e model. And Cap3D dataset provide point clouds in 16384x6, but point-e requires 1024x6 for stage1 training, and 4096 x 6 for stage2 upsampling. How do you convert 16384 to 1024 or 4096? Could you provide more details?