NVlabs / InstantSplat

InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds
https://instantsplat.github.io/
Other
864 stars 54 forks source link

Dataset question #11

Open kk6398 opened 3 months ago

kk6398 commented 3 months ago

Hi, thanks for your excellent work. For the TT dataset, when the training view is 12, the test view is the remaining 12. When the training view n=3 or n=6, are all remaining views 21 or 18 test perspectives?

kk6398 commented 3 months ago

Hi, thanks for your excellent work. For the TT dataset, when the training view is 12, the test view is the remaining 12. When the training view n=3 or n=6, are all remaining views 21 or 18 test perspectives?

Sorry, I have figure out as decscribed in paper. However, the test image is uniformly sampled from 22 images excluding the first and last one, which makes 11 images, right? In addition, how can I deal it in code when n=3 or 6. Because I discovery there is only 3 training view in "...\InstantSplat\data\TT\Family\3_views\images". How can I get the corresponing test view?

kairunwen commented 3 months ago

Hi, when the training view n=3/6/12, test views n=12. You can get the initial test view pose here: [https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L85-L141]() Take training view number = 3 as an example: (1) input 3 train imgs to dust3r --> get 3 pointcloud (defined as train_pcd) (2) input 3 train imgs and 12 test imgs to dust3r --> get 15 pointcloud (3 pcd1 & 12 pcd2) and 15 pose (3 pose1 & 12 pose2) (3) use 3 train_pcd & 3 pcd1 to apply pointcloud registration and calculate transform_matrix --> get transform_matrix M (4) use transform_matrix M to transform 12 pose2 into 12 test_pose --> get initial test_pose = 12 test_pose (5) optimize test_pose to achieve a more precise alignment for evaluation: [https://github.com/NVlabs/InstantSplat/blob/main/render.py#L45]()

kk6398 commented 3 months ago

Thank you for your reply. Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"? So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

kairunwen commented 3 months ago

Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"?

No, we split train_imgs here: https://github.com/NVlabs/InstantSplat/blob/main/coarse_init_eval.py#L56-L64 and split test_imgs for evaluation here: https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L62-L72

So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images

kk6398 commented 3 months ago

Do we need to concretely split all images into train and test folders first, so that we can proceed with "(1) input 3 train imgs to dust3r"?

No, we split train_imgs here: https://github.com/NVlabs/InstantSplat/blob/main/coarse_init_eval.py#L56-L64 and split test_imgs for evaluation here: https://github.com/NVlabs/InstantSplat/blob/main/init_test_pose.py#L62-L72

So how are the 12 images in the test view selected? The test image is uniformly sampled from 22 images excluding the first and last one(described in the paper), which makes 11 images, right?

The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images

So, we need to change the "llffhold" when we change the training view? For example, llffhold=4 when n_views=6, llffhold=8 when n_views=3. As for "The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images". Specifically, number0-23(totally 24 frames). When excluding the first and last one, we select the number 1,3,5,7,9,11,13,15,17,19,21, which totally 11 frames, right? In addition, [init_test_pose.py#L62-L72] (https://github.com/NVlabs/InstantSplat/issues/url) indicates that we split the dataset into training(0,2,4,6,8,10,12,14,16,18,20,22) and test view(1,3,5,7,9,11,13,15,17,19,21,23).

kairunwen commented 2 months ago

So, we need to change the "llffhold" when we change the training view? For example, llffhold=4 when n_views=6, llffhold=8 when n_views=3.

No.

As for "The test image is uniformly sampled from 22 images excluding the first and last one, which makes 12 images". Specifically, number0-23(totally 24 frames). When excluding the first and last one, we select the number 1,3,5,7,9,11,13,15,17,19,21, which totally 11 frames, right? In addition, [init_test_pose.py#L62-L72] (https://github.com/NVlabs/InstantSplat/issues/url) indicates that we split the dataset into training(0,2,4,6,8,10,12,14,16,18,20,22) and test view(1,3,5,7,9,11,13,15,17,19,21,23).

Train view idx = (0 3 5 7 9 11 13 15 17 19 21 23) Test view idx = (1 2 4 6 8 10 12 14 16 18 20 22)

Master-cai commented 2 months ago

@kairunwen Hi! I think there is a discrepancy between the code and your description. I add some code to init_test_pose.py to print the idx and img name for training and testing:

    # ---------------- (1) Prepare Train & Test images list ---------------- 
    all_img_list = sorted(os.listdir(os.path.join(img_base_path, "images")))
    if args.llffhold > 0:
        train_img_list = [c for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold != 0]
        train_img_idx = [idx for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold != 0]
        test_img_list = [c for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold == 0]
        test_img_idx = [idx for idx, c in enumerate(all_img_list) if (idx+1) % args.llffhold == 0]
    # sample sparse view
    indices = np.linspace(0, len(train_img_list) - 1, n_views, dtype=int)
    print(indices)
    print(f"trn idx {train_img_idx}, name {train_img_list}")
    print(f"tst idx {test_img_idx}, name {test_img_list}")

And the result is:

trn idx [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22], name ['000291.jpg', '000301.jpg', '000312.jpg', '000322.jpg', '000332.jpg', '000343.jpg', '000353.jpg', '000363.jpg', '000374.jpg', '000384.jpg', '000394.jpg', '000405.jpg']
tst idx [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23], name ['000296.jpg', '000307.jpg', '000317.jpg', '000327.jpg', '000338.jpg', '000348.jpg', '000358.jpg', '000369.jpg', '000379.jpg', '000389.jpg', '000400.jpg', '000410.jpg']

But you stated:

Train view idx = (0 3 5 7 9 11 13 15 17 19 21 23) Test view idx = (1 2 4 6 8 10 12 14 16 18 20 22)

I'm confused. Is that any thing wrong?