Closed LiXinghui-666 closed 1 year ago
Hi, it's used to scale the distance to depth because the ray_dirs is normalised so the z_vals is the Euclidian distance not the depth.
I also have some problems about depth loss.As you said,monocular depth output range in 0.01 to 0.04,should I normalization gt depth in this range.I use nice_slam_apartment_to_monosdf.py get Apartment dataset depth and rgb,I want to only use depth and rgb not normal,but I found the gt depth is not in 0.01-0.04 range,the depth loss is large and can't coverge.
@Wjt-shift If you use gt-depth, you don't need to use scale-invariant loss as it is designed to handle scale ambiguity in monocular depth. You could simply use L1 or L2 loss for the gt-depth but you need to scale the gt-depth accordingly if you normalize the camera poses.
@Wjt-shift If you use gt-depth, you don't need to use scale-invariant loss as it is designed to handle scale ambiguity in monocular depth. You could simply use L1 or L2 loss for the gt-depth but you need to scale the gt-depth accordingly if you normalize the camera poses.
Thanks for your reply! I also have questions about your code.In nice_slam_apartment_to_monosdf.py and scannet_to_monosdf.py,why the gt_depth should be divided 1000?1000 is scale?And how you get your replica datasets format for monosdf?because I found in your replica datasets,the depth is in range 0.01 to 0.04.
@Wjt-shift gt_depth / 1000 because this is how the gt depths are saved. The replica dataset is converted from nice-slam, and we run ominidata to get mono depth and normals and the omnidata depth output is in the range of 0.01 to 0.04.
@Wjt-shift If you use gt-depth, you don't need to use scale-invariant loss as it is designed to handle scale ambiguity in monocular depth. You could simply use L1 or L2 loss for the gt-depth but you need to scale the gt-depth accordingly if you normalize the camera poses.
I am sorry to bother you again.I use gt_depth with L1 loss.But I find the depth L1 loss can't coverge.As you said scale the gt_depth.In Replica datasets,I get the scale in camera.json file ,I input gt_depth divide scale,I found gt_depth in range of 0 to 6,but output is not in this range,so the depth L1 loss can't coverge.I compare the training only rgb loss and rgb+depth L1 loss,but only rgb loss can get a better result than add depth L1 loss.What cause this problem?What should I scale my depth? This is the depth L1 loss when training, And compare the only rgb loss and rgb+depth L1 loss,the green one is only use rgb loss,another one is use rgb+depth L1 loss. compare the psnr,the green one is only use rgb loss,another one is use rgb+depth L1 loss.
@Wjt-shift The gt depth in replica is in range of [0, 6], so after your normalization, it should be in range around [0, 2] because the scene in normalized in [-1, 1]? Maybe you should multiply the depth with scale instead of dividing it.
@Wjt-shift The gt depth in replica is in range of [0, 6], so after your normalization, it should be in range around [0, 2] because the scene in normalized in [-1, 1]? Maybe you should multiply the depth with scale instead of dividing it.
Thanks for your quick reply!I processed my gt_depth when loaded from depth image and not dividing 1000 instead of dividing 6553.5(scale in camera.json).And I use this gt_depth as input .Should I normalized input gt_depth and output depth in range of [0,1] use formulation (depth-depth.min)/(depth.max-depth.min)? This is camera.json I try to simultaneously normalize input and output depth in range of [0,1],but can't get your replica dataset's great result. The bule one is your replica dataset result.
Hi, after you divide the depth by 6553.5, you get the depth value in meters. But we further normalize the scene where the normalization factor is computed from the gt mesh and saved as scale_mat in cameras.npz. So you need to dived by this scale also. For example
detph_gt = cv2.imread("depth.png", -1 ) / 6553.5
depth_gt_for_training = depth_gt / (np.load('cameras.npz')['scale_mat_0'][0, 0])
Hi, after you divide the depth by 6553.5, you get the depth value in meters. But we further normalize the scene where the normalization factor is computed from the gt mesh and saved as scale_mat in cameras.npz. So you need to dived by this scale also. For example
detph_gt = cv2.imread("depth.png", -1 ) / 6553.5 depth_gt_for_training = depth_gt / (np.load('cameras.npz')['scale_mat_0'][0, 0])
Thanks for you reply! It seems solved after your suggestions.I also have another question.If I get the depth without scale(such as from monocular slam system),I should use scale_mat scale depth(depth divide scale_mat)?(scale_mat seems normalized pose to unit cube?)And this situation,should I use your depth loss?Thanks for you again,your suggestions is helpful.
Hi, depth from a monocular system is also up to scale, and in this case, you should use scale-invariant loss in MonoSDF.
Hi, after you divide the depth by 6553.5, you get the depth value in meters. But we further normalize the scene where the normalization factor is computed from the gt mesh and saved as scale_mat in cameras.npz. So you need to dived by this scale also. For example
detph_gt = cv2.imread("depth.png", -1 ) / 6553.5 depth_gt_for_training = depth_gt / (np.load('cameras.npz')['scale_mat_0'][0, 0])
Thanks for you reply! It seems solved after your suggestions.I also have another question.If I get the depth without scale(such as from monocular slam system),I should use scale_mat scale depth(depth divide scale_mat)?(scale_mat seems normalized pose to unit cube?)And this situation,should I use your depth loss?Thanks for you again,your suggestions is helpful.
Hi,I thought the problem solved before.But with the train going,the results did not meet expectations.I compare with the result with 3 situation:
i)my Replica dataset only rgb and eikonal_loss(red color line)
ii)my Replica with rgb ,eikonal and depth L1 loss,gt_depth deal with like you suggest format,and not other normalize options:(orange color line)
detph_gt
= cv2.imread("depth.png", -1 ) / 6553.5
depth_gt_for_training = depth_gt / (np.load('cameras.npz')['scale_mat_0'][0, 0])`
iii) in your replica dataset with rgb ,eikonal, depth and normal loss.(blue color line)
This is the result for 3 situation.
It seems unreasonable.The only rgb and eikonal_loss seems coverge fast than rgb ,eikonal and depth L1 loss.And depth L1 loss can't get a better psnr with same iteration times.When use depth_gt,the output depth mean in iteration like follow img:
Should I process gt_depth with other options?should change range with scale gt_depth?(such as your suggestion
before:
The gt depth in replica is in range of [0, 6], so after your normalization, it should be in range around [0, 2] because the scene in normalized in [-1, 1]? Maybe you should multiply the depth with scale instead of dividing its )
Hi, depth from a monocular system is also up to scale, and in this case, you should use scale-invariant loss in MonoSDF.
And if use different datasets,the constant 50 and 5 in the code should change to suit output depth?
self.depth_loss(depth_pred.reshape(1, 32, 32), (depth_gt * 50 + 0.5).reshape(1, 32, 32), mask.reshape(1, 32, 32))
I would suggest you first make it work on our preprocessed dataset with GT depth supervision. The scaling constant 50 is chosen heuristically (see explanation at the very beginning of this issue) and needs to be changed if you use other datasets or try to tune the weights of depth loss.
Hi, we added support for rgbd data in SDFStudio. Maybe you could try it out if interested.
There's the same question about scale scaling. Since a scale and shift are calculated in advance when the depth loss is calculated, what is the function of the following line in the code?
Originally posted by @LiXinghui-666 in https://github.com/autonomousvision/monosdf/issues/18#issuecomment-1275725230