Evaluation with a scale factor on RealEstate10K

Hi,

I'm trying to run baseline comparison with SynSin on RealEstate10K. Since SynSin is not scale-invariant, I computed a scale factor using a sparse point cloud (from point triangulation using the camera poses) and the depth image produced by the model, and scale both of the source pose and the target pose accordingly as follows:

ref_cams = batch["cameras"][0]
dst_cams = batch["cameras"][1]
ref_pose = K_offset_inv @ ref_cams["P"][:, :-1].cuda()
dst_pose = K_offset_inv @ dst_cams["P"][:, :-1].cuda()
ref_pose[:, 0:3, 3] /= scale_factor.view(-1, 1)
dst_pose[:, 0:3, 3] /= scale_factor.view(-1, 1)

After doing this I ran forward again, but this led to much worse results than without the scale factor. What's more, scaling the ref pose only seems to not make any difference in the results, which looks strange. Is this the right way to do scale-invariant evaluation? If not, what is the correct way of doing this?

Thanks.

facebookresearch / synsin

Evaluation with a scale factor on RealEstate10K #31