Open everythoughthelps opened 1 week ago
Another question, Next Best View you design is a good strategy to render the next view step by step, I was wondering, is it possible to obtain the same result by using the adjacent camera and frames aligning with the time line, say right the next frames of the ref images, will this also work?
This is a great work! you said "Then, we randomly select the constructed point cloud of the video frames and render it along the estimated camera trajectory using Pytorch3D." in Sec. 4.1, from my perspective, you want to build some frame pairs in this step right, but one thing that confuses me is, the hole frames are rendered from point cloud, which is constructed from all the frames, so the rendered frames and the real frames are supposed to have very little gap between them? Or you split the frames into two groups, say we have 25 frames, 10 frames are used to build the point cloud with dust3R, then we render the rest 15 frames and combine them with the 15 gt frames to construct the video pairs?
Thanks! We use all 25 frames to build the point cloud with Dust3r, and the rendered frames typically differ from the ground truth frames used for supervision. This allows the model to learn how to refine and correct the rendered frames.
This is a great work! you said "Then, we randomly select the constructed point cloud of the video frames and render it along the estimated camera trajectory using Pytorch3D." in Sec. 4.1, from my perspective, you want to build some frame pairs in this step right, but one thing that confuses me is, the hole frames are rendered from point cloud, which is constructed from all the frames, so the rendered frames and the real frames are supposed to have very little gap between them? Or you split the frames into two groups, say we have 25 frames, 10 frames are used to build the point cloud with dust3R, then we render the rest 15 frames and combine them with the 15 gt frames to construct the video pairs?
Thanks! We use all 25 frames to build the point cloud with Dust3r, and the rendered frames typically differ from the ground truth frames used for supervision. This allows the model to learn how to refine and correct the rendered frames.
Ok, I thought the rendered images had little gap with the gt images. Thank you.
This is a great work! you said "Then, we randomly select the constructed point cloud of the video frames and render it along the estimated camera trajectory using Pytorch3D." in Sec. 4.1, from my perspective, you want to build some frame pairs in this step right, but one thing that confuses me is, the hole frames are rendered from point cloud, which is constructed from all the frames, so the rendered frames and the real frames are supposed to have very little gap between them? Or you split the frames into two groups, say we have 25 frames, 10 frames are used to build the point cloud with dust3R, then we render the rest 15 frames and combine them with the 15 gt frames to construct the video pairs?