NVlabs / InstantSplat

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds
https://instantsplat.github.io/
Other
687 stars 32 forks source link

Dataset question #11

Open kk6398 opened 3 weeks ago

kk6398 commented 3 weeks ago

Hi, thanks for your excellent work. For the TT dataset, when the training view is 12, the test view is the remaining 12. When the training view n=3 or n=6, are all remaining views 21 or 18 test perspectives?

kk6398 commented 3 weeks ago

Hi, thanks for your excellent work. For the TT dataset, when the training view is 12, the test view is the remaining 12. When the training view n=3 or n=6, are all remaining views 21 or 18 test perspectives?

Sorry, I have figure out as decscribed in paper. However, the test image is uniformly sampled from 22 images excluding the first and last one, which makes 11 images, right? In addition, how can I deal it in code when n=3 or 6. Because I discovery there is only 3 training view in "...\InstantSplat\data\TT\Family\3_views\images". How can I get the corresponing test view?

kairunwen commented 2 weeks ago

Hi, when the training view n=3/6/12, test views n=12. You can get the initial test view pose here: [https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L85-L141]() Take training view number = 3 as an example: (1) input 3 train imgs to dust3r --> get 3 pointcloud (defined as train_pcd) (2) input 3 train imgs and 12 test imgs to dust3r --> get 15 pointcloud (3 pcd1 & 12 pcd2) and 15 pose (3 pose1 & 12 pose2) (3) use 3 train_pcd & 3 pcd1 to apply pointcloud registration and calculate transform_matrix --> get transform_matrix M (4) use transform_matrix M to transform 12 pose2 into 12 test_pose --> get initial test_pose = 12 test_pose (5) optimize test_pose to achieve a more precise alignment for evaluation: [https://github.com/NVlabs/InstantSplat/blob/main/render.py#L45]()

kk6398 commented 2 weeks ago

Thank you for your reply. Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"? So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

kairunwen commented 2 weeks ago

Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"?

No, we split train_imgs here: https://github.com/NVlabs/InstantSplat/blob/main/coarse_init_eval.py#L56-L64 and split test_imgs for evaluation here: https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L62-L72

So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images

kk6398 commented 2 weeks ago

Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"?

No, we split train_imgs here: https://github.com/NVlabs/InstantSplat/blob/main/coarse_init_eval.py#L56-L64 and split test_imgs for evaluation here: https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L62-L72

So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images

So, we need to change the "llffhold" when we change the training view? For example, llffhold=4 when n_views=6, llffhold=8 when n_views=3. As for "The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images". Specifically, number0-23(totally 24 frames). When excluding the first and last one, we select the number 1,3,5,7,9,11,13,15,17,19,21, which totally 11 frames, right? In addition, [init_test_pose.py#L62-L72] (https://github.com/NVlabs/InstantSplat/issues/url) indicates that we split the dataset into training(0,2,4,6,8,10,12,14,16,18,20,22) and test view(1,3,5,7,9,11,13,15,17,19,21,23).

kairunwen commented 1 day ago

So, we need to change the "llffhold" when we change the training view? For example, llffhold=4 when n_views=6, llffhold=8 when n_views=3.

No.

As for "The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images". Specifically, number0-23(totally 24 frames). When excluding the first and last one, we select the number 1,3,5,7,9,11,13,15,17,19,21, which totally 11 frames, right? In addition, [init_test_pose.py#L62-L72] (https://github.com/NVlabs/InstantSplat/issues/url) indicates that we split the dataset into training(0,2,4,6,8,10,12,14,16,18,20,22) and test view(1,3,5,7,9,11,13,15,17,19,21,23).

Train view idx = (0 3 5 7 9 11 13 15 17 19 21 23) Test view idx = (1 2 4 6 8 10 12 14 16 18 20 22)