Closed zl264 closed 1 month ago
Thanks for your interest in our work.
T
by the scale_factor
. Since our model was trained on the DTU dataset (depth range 425 to 905), when testing on other new datasets, we used scale_factor
to adjust the depth of the new scenario to be close to the depth of the DTU dataset to get good results. For example, for the nerf synthetic dataset with a depth range of 2.5 to 5.5, we set scale_factor
to 200. The depth is multiplied by scale_factor
, and accordingly the translation T
should be also multiplied by scale_factor
. Therefore, in the subsequent per-scene optimization phase, the depth of the initialized point cloud is actually scaled by scale_factor
, so the corresponding translation T
also needs to be scaled. If you use your own data, you need to determine whether the translation T
needs to be scaled based on whether the depth of the initial point cloud is scaled.scale_factor
is not the bigger the better, but is adjusted according to the depth range, as described in 2.Thanks for your interest in our work.
- Your observation is correct. During the per-scene optimization, we only use the point cloud (including xyz and rgb) provided by MVSGaussian as initialization, not all Gaussian attributes (xyz, rgb, rotation, scaling, and opacity). Intuitively, loading all Gaussian attributes as initializations would yield better results, but we have tried loading all attributes and found that this approach does not necessarily lead to better results. We found that 3DGS relies on initialization heavily, and the key is to accurately initialize the location of the point cloud and the number of points to be dense (especially for complex scenes), while other attributes, such as rotation, scaling, and opacity, may be sufficient using the default initialization methods designed by 3DGS.
- First of all, let's explain why we multiply the translation
T
by thescale_factor
. Since our model was trained on the DTU dataset (depth range 425 to 905), when testing on other new datasets, we usedscale_factor
to adjust the depth of the new scenario to be close to the depth of the DTU dataset to get good results. For example, for the nerf synthetic dataset with a depth range of 2.5 to 5.5, we setscale_factor
to 200. The depth is multiplied byscale_factor
, and accordingly the translationT
should be also multiplied byscale_factor
. Therefore, in the subsequent per-scene optimization phase, the depth of the initialized point cloud is actually scaled byscale_factor
, so the corresponding translationT
also needs to be scaled. If you use your own data, you need to determine whether the translationT
needs to be scaled based on whether the depth of the initial point cloud is scaled.- The
scale_factor
is not the bigger the better, but is adjusted according to the depth range, as described in 2.
Thank you very much for your reply. Regarding the 2. and 3., I would like to add the following:
def readColmapCameras(cam_extrinsics, cam_intrinsics, path, images_folder, size=(960, 640), scale_factor=100,
init_ply=None):
cam_infos = []
for idx, key in enumerate(cam_extrinsics):
sys.stdout.write('\r')
# the exact output you're looking for:
sys.stdout.write("Reading camera {}/{}".format(idx + 1, len(cam_extrinsics)))
sys.stdout.flush()
extr = cam_extrinsics[key]
intr = cam_intrinsics[extr.camera_id]
h_o, w_o = intr.height, intr.width
height = size[0]
width = size[1]
uid = intr.id
R = np.transpose(qvec2rotmat(extr.qvec))
T = np.array(extr.tvec)
T = T * scale_factor
if intr.model == "SIMPLE_PINHOLE":
focal_length_x = intr.params[0]
focal_length_x = focal_length_x * width / w_o
FovY = focal2fov(focal_length_x, height)
FovX = focal2fov(focal_length_x, width)
elif intr.model == "PINHOLE":
focal_length_x = intr.params[0]
focal_length_y = intr.params[1]
focal_length_x = focal_length_x * width / w_o
focal_length_y = focal_length_y * height / h_o
FovY = focal2fov(focal_length_y, height)
FovX = focal2fov(focal_length_x, width)
elif intr.model == "SIMPLE_RADIAL":
focal_length_x = intr.params[0] * width / w_o
focal_length_y = intr.params[0] * height / h_o
FovY = focal2fov(focal_length_y, height)
FovX = focal2fov(focal_length_x, width)
else:
assert False, "Colmap camera model not handled: only undistorted datasets (PINHOLE or SIMPLE_PINHOLE cameras) supported!"
image_path = os.path.join(images_folder, os.path.basename(extr.name))
image_name = os.path.basename(image_path).split(".")[0]
image = Image.open(image_path)
image = (np.array(image)).astype(np.float32)
image = cv2.resize(image, size[::-1], interpolation=cv2.INTER_AREA)
image = Image.fromarray(image.astype(np.uint8))
cam_info = CameraInfo(uid=uid, R=R, T=T, FovY=FovY, FovX=FovX, image=image,
image_path=image_path, image_name=image_name, width=width, height=height)
cam_infos.append(cam_info)
sys.stdout.write('\n')
return cam_infos
When I set the scale_factor to 1 while using MVSGaussian for both reference and per-scene optimization, the optimization process works correctly. However, when I set the scale_factor to other values, such as 100, MVSGaussian achieves higher PSNR in the rendered images during reference but fails to optimize correctly during per-scene optimization.
Thank you very much for this excellent work! I have encountered some issues during the per-scene optimization process.
T = T * scale_factor
but this approach fails during the optimization process, preventing the loss from decreasing. However, when I set the scale_factor to 1, the loss decreases and normal scene optimization can be performed.