how to deal with indoor datasets like scannet?

ActiveVisionLab / nope-nerf

(CVPR 2023) NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior

https://nope-nerf.active.vision/

MIT License

384 stars 30 forks source link

how to deal with indoor datasets like scannet? #3

Open zjulabwjt opened 1 year ago

zjulabwjt commented 1 year ago

I find in your paper you have experiment in scannet,but I can't find the scannet dataset in your github.How you deal with the scannet dataset in your code.loading indoor complex trajectories method like nerf_llff dataset in dataloading folder’s code ?

bianwenjing commented 1 year ago

Hi, you can take the intrinsics and poses provided in the txt files of ScanNet dataset and convert them into npz files to load with this line and this line. Also, for some ScanNet images with black borders, you can crop the image before training by setting cfg['dataloading']['crop_size'].

zjulabwjt commented 1 year ago

Hi, you can take the intrinsics and poses provided in the txt files of ScanNet dataset and convert them into npz files to load with this line and this line. Also, for some ScanNet images with black borders, you can crop the image before training by setting cfg['dataloading']['crop_size'].

Thank for your quick reply!I use colmap get the indoor datasets pose and use nerf's code get the poses_bound.npz.I have some question about your experments.I find in your paper the pose trajectories is simple(like straight forward or 360 objectory view).In your experments,have you test the complex trajectories like slam indoor datasets' trajectories,and the result is ok for pose refine without no pose prior? I use colmap estimate the indoor datasets' pose and get 96 images .I find result is bad. Can you give me some advice for this result?Thanks for your great work and your help!

bianwenjing commented 1 year ago

Hi, the algorithm can fail when the input views are sparse because the point cloud loss is based on dense matching between views. In our ScanNet scenes, we took image sequences directly from the dataset without subsampling. And the sequences are of ~100 frames, which is probably of a smaller scale than SLAM. In your experiment, as the initial RPE_rot shown on this plot is quite large, I suspect this is a challenging scene where the point cloud loss may not work well. I recommend you sample images more densely. You can also try to improve the algorithm by replacing the dense matching with a sparse one.

zjulabwjt commented 1 year ago

Hi, the algorithm can fail when the input views are sparse because the point cloud loss is based on dense matching between views. In our ScanNet scenes, we took image sequences directly from the dataset without subsampling. And the sequences are of ~100 frames, which is probably of a smaller scale than SLAM. In your experiment, as the initial RPE_rot shown on this plot is quite large, I suspect this is a challenging scene where the point cloud loss may not work well. I recommend you sample images more densely. You can also try to improve the algorithm by replacing the dense matching with a sparse one.

Thanks for your quick reply!What do you mean about replacing the dense matching with a sparse one?It means use less point clouds compute point cloud loss ?

bianwenjing commented 1 year ago

You can try to generate sparse correspondences first and use points at these correspondences to compute the point cloud loss, which is likely to be more robust for sparse views

z619850002 commented 1 year ago

Could you please tell me which frames are you utilized in your experimental results of the Scannet dataset? From the paper I can just know that for each sequence about 80-100 frames are chosen, while for each sequence there are thousands of frames in the original dataset. I'm curious about the started and end indices of frames in the four sequence of Scannet. Thanks!