donydchen / mvsplat

🌊 [ECCV'24 Oral] MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
https://donydchen.github.io/mvsplat
MIT License
750 stars 35 forks source link

Custom dataset training #11

Closed VillardX closed 6 months ago

VillardX commented 6 months ago

Hi, thanks for the great work. I have some questions about custom data training.

In the paper, re10k data training only input 2 context-view rgb images and corresponding intrinsics and extrinsics,and output a novel view rgb.

  1. About the znear and zfar, in “dataset_re10k.py”, it is set 1 and 100. Should znear and zfar be modified if trained on my custom dataset? What 1 and 100 mean? Meter?
  2. About extrinsic and intrinsic, according to pixelsplat, “Our extrinsics are OpenCV-style camera-to-world matrices. This means that +Z is the camera look vector, +X is the camera right vector, and -Y is the camera up vector. Our intrinsics are normalized, meaning that the first row is divided by image width, and the second row is divided by image height.” I don’t know what the dimension of T vector of extrinsic, is the T vector in meters? And according to your “dataset_re10k.py”, the extrinsic of raw data is “w2c” and you return “w2c.inverse()” as c2w in function “convert_poses()”. Is my understanding correct?
  3. The num of context view is 3 in my custom dataset. In the paper, it is trained with 2 context view. Where can I modify it?
    By the way, the paper use MVS cost volume, but the model is mainly trained with 2-input-view setting. Did you try to train with mutiple-input-view-setting?
donydchen commented 6 months ago

Hi @VillardX, thanks for your interest in our work.

We empirically set the (near, far) as (1, 100), following our previous work MuRF (see the implementations HERE). If I remember correctly, these two values actually have no strict physical meanings, we just warp the images and find that they fit. Indeed, it needs to be set to other values if you work on other datasets. For example, you can set them according to the COLMAP data if you have it, more references can be found HERE. Or you can follow us to warp the input images to decide if you do not have the COLMAP data, see https://github.com/donydchen/mvsplat/issues/4#issuecomment-2019200882.


I am not sure whether T is in meters or not (I guess it is also relative value though, since it is reconstructed, not real ground truth). You may refer to the RE10K homepage for more details. Your understanding is correct, the raw data is 'w2c'.


For more information about how to train and test with more views, kindly refer to https://github.com/donydchen/mvsplat/issues/4.