donydchen / mvsplat

🌊 [ECCV'24 Oral] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
https://donydchen.github.io/mvsplat
MIT License
750 stars 35 forks source link

question about extension to multi-view (>2) inputs and cost volume #4

Closed Friedrich-M closed 6 months ago

Friedrich-M commented 6 months ago

Thanks for your great contribution to this promising and interesting field.

I noticed that the paper's main experiment focused on two-view inputs, similar to PixelSplat. However, as you mentioned in the article, the MVS-based method can naturally be applied to multi-views(>2). Can the current pre-trained model directly extend to multi-view (>2) input?

Besides, the cost volume used in the paper needs the (near, far) plane for discrete depth sampling. So when we extend to other datasets w/o gt (near, far) as input, how should we deal with it? Also, while each view has a separate cost volume, when the view becomes dense and reso becomes larger, how to deal with the increased parameters and the need for cross-view information exchanging?

donydchen commented 6 months ago

Hi @Friedrich-M, thanks for your interest in our work.

Yes, the pre-trained model can be applied to an arbitrary number of sparse input views. Below I provide a simple example to test the 3-views cases.

{"5aca87f95a9412c6": {"context": [58, 102, 133], "target": [84, 129]}, "322261824c4a3003": {"context": [33, 60, 78], "target": [38, 61]}}
python -m src.main +experiment=re10k \
checkpointing.load=checkpoints/re10k.ckpt \
mode=test \
dataset/view_sampler=evaluation \
test.compute_scores=true \
dataset.view_sampler.index_path=assets/evaluation_index_re10k_3views.json \
wandb.name=abl/re10k_3views \
dataset.view_sampler.num_context_views=3 

ori_images = rearrange(
    extra_info["images"], "(v b) c h w -> b v c h w", v=v, b=b
)
scene_names = extra_info["scene_names"]
intr_curr_ori = intrinsics[:, :, :3, :3].clone().detach()  # [b, v, 3, 3]
intr_curr_ori[:, :, 0, :] *= float(ori_images.shape[-1])
intr_curr_ori[:, :, 1, :] *= float(ori_images.shape[-2])
intr_curr_ori = rearrange(
    intr_curr_ori, "b v ... -> (v b) ...", b=b, v=v
)  # [2xb 3 3]
init_view_order = list(range(v))
image01 = ori_images
for idx in range(1, v):
    cur_view_order = init_view_order[idx:] + init_view_order[:idx]
    cur_images10 = ori_images[:, cur_view_order]  # (b, v, c, h, w)
    image10 = rearrange(cur_images10, "b v c h w -> (v b) c h w")
    pose_curr = pose_curr_lists[idx - 1]

    image01_warped = warp_with_pose_depth_candidates(
        image10,
        intr_curr_ori,
        pose_curr,
        1.0 / disp_candi_curr.repeat([1, 1, *image10.shape[-2:]]),
        warp_padding_mode=self.warp_padding_mode,
    )  # [B, C, D, H, W]
    image01_warped = rearrange(
        image01_warped, "(v b) ... -> b v ...", v=v, b=b
    )
    for batch_idx in range(b):
        out_dir = os.path.join(
            "warp_images",
            f"near_{near[0, 0].item():.1f}_far_{int(far[0, 0].item())}",
            (
                scene_names[batch_idx]
                if scene_names is not None
                else str(batch_idx)
            ),
        )
        os.makedirs(out_dir, exist_ok=True)
        for v_idx in range(v):
            Image.fromarray(
                (image01[batch_idx, v_idx] * 255)
                .byte()
                .permute(1, 2, 0)
                .detach()
                .cpu()
                .numpy()
            ).save(f"{out_dir}/{v_idx}ori.png")
            for d_idx in range(image01_warped.shape[3]):
                Image.fromarray(
                    (image01_warped[batch_idx, v_idx, :, d_idx] * 255)
                    .byte()
                    .permute(1, 2, 0)
                    .detach()
                    .cpu()
                    .numpy()
                ).save(
                    f"{out_dir}/{v_idx}warped_from{cur_view_order[v_idx]}_{d_idx}.png"
                )

Friedrich-M commented 6 months ago

Thank you for your insightful reply! They are really helpful.