Open xiyichen opened 1 day ago
Hi, thanks for your interest in our work! Yes, it's quite easy to reproduce the training code using that of DynamiCrafter; the only modifications needed are to the training data and some data loader scripts.
Thanks for your quick reply! There's one part that I don't quite understand:
We then randomly select one or more frames from the point cloud, dropping the others, and render it using the camera poses previously generated by DUSt3R to obtain a point cloud render result.
If I understand correctly, Dust3r gives 25 point clouds, one for each frame, and all of them are globally aligned in the world space with the recovered camera poses. In each training clip, do you always use a random one from the 25 point clouds and render view-dependent point cloud images for all 25 views, or do you randomly sample from a few of the 25 point clouds and render them?
If you only use point clouds for one view, the renders could be bad if the viewpoint deviation is large. Have you tried to render point clouds merged from all frames?
Nice work! I'm trying to reproduce your training code by modifying the code from DynamiCrafter.
If I understand correctly, the camera parameters are not passed into the diffusion as additional conditionings, and the only information the network has about the viewpoints is the point cloud renderings. Please correct me if I got that wrong.
I still have a few questions about training:
Looking forward to your response!