Open Cresynia opened 1 month ago
Hi @Cresynia, sorry for the confusion. It appears that some hyperparameters, such as scaling factors, are missing from the current code for the KITTI dataset. We'll review the code and provide an update soon. Thanks for your patience!
Thanks so much!
---Original--- From: @.> Date: Mon, Jul 29, 2024 12:20 PM To: @.>; Cc: @.**@.>; Subject: Re: [Sungmin-Woo/ProDepth] about reproduce (Issue #2)
Hi @Cresynia, sorry for the confusion. It appears that some hyperparameters, such as scaling factors, are missing from the current code for the KITTI dataset. We'll review the code and provide an update soon. Thanks for your patience!
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
Hi @Cresynia, sorry for the confusion. It appears that some hyperparameters, such as scaling factors, are missing from the current code for the KITTI dataset. We'll review the code and provide an update soon. Thanks for your patience!
Thank you for your great work! I am very interested in the depth error diagram in Figure 4. I can make this diagram and observe some model results just like you. Could you please provide the relevant code
Hi @757787182, thanks for your interest!
You can visualize the error maps using following codes in evaluate_depth.py. We use improved ground truth for the visualization and interpolate the GT map with cv2 dilation.
gt_masks = []
absrel_maps_mono = []
absrel_maps_multi = []
gt_disps = []
absrel_maps_mono_disp = []
absrel_maps_multi_disp = []
for i in tqdm.tqdm(range(len(gt_depths))):
vis_gt_depth = gt_depths[i]
vis_gt_height, vis_gt_width = vis_gt_depth.shape[:2]
_pred_disp_z = np.squeeze(multi_pred_disps[i])
_pred_disp_mono = np.squeeze(mono_pred_disps[i])
vis_pred_disp_mono = cv2.resize(_pred_disp_mono, (vis_gt_width, vis_gt_height))
vis_pred_disp_z = cv2.resize(_pred_disp_z, (vis_gt_width, vis_gt_height))
vis_pred_depth_z = 1 / vis_pred_disp_z
vis_pred_depth_mono = 1 / vis_pred_disp_mono
if opt.eval_split == "eigen":
mask = np.logical_and(vis_gt_depth > MIN_DEPTH, vis_gt_depth < MAX_DEPTH)
crop = np.array([0.40810811 * vis_gt_height, 0.99189189 * vis_gt_height,
0.03594771 * vis_gt_width, 0.96405229 * vis_gt_width]).astype(np.int32)
crop_mask = np.zeros(mask.shape)
crop_mask[crop[0]:crop[1], crop[2]:crop[3]] = 1
mask = np.logical_and(mask, crop_mask)
else:
mask = vis_gt_depth > 0
pred_depth_z_mask = vis_pred_depth_z[mask]
pred_depth_mono_mask = vis_pred_depth_mono[mask]
gt_depth_mask = vis_gt_depth[mask]
if not opt.disable_median_scaling:
ratio1 = np.median(gt_depth_mask) / np.median(pred_depth_mono_mask)
pred_depth_mono_mask *= ratio1
ratio2 = np.median(gt_depth_mask) / np.median(pred_depth_z_mask)
pred_depth_z_mask *= ratio2
vis_pred_depth_mono *= ratio1
vis_pred_depth_z *= ratio2
vis_pred_depth_z[vis_pred_depth_z < MIN_DEPTH] = MIN_DEPTH
vis_pred_depth_z[vis_pred_depth_z > MAX_DEPTH] = MAX_DEPTH
vis_pred_depth_mono[vis_pred_depth_mono < MIN_DEPTH] = MIN_DEPTH
vis_pred_depth_mono[vis_pred_depth_mono > MAX_DEPTH] = MAX_DEPTH
kernel = np.ones((3, 3), np.uint8)
vis_gt_depth = cv2.dilate(vis_gt_depth, kernel, iterations=1)
vis_mask = vis_gt_depth > 0
absrel_map_multi = np.zeros(vis_gt_depth.shape)
absrel_map_multi[vis_mask] = np.abs(vis_gt_depth[vis_mask]-vis_pred_depth_z[vis_mask]) / vis_gt_depth[vis_mask]
absrel_map_multi_disp = np.zeros(vis_gt_depth.shape)
absrel_map_multi_disp[vis_mask] = np.abs(1/vis_gt_depth[vis_mask]-1/vis_pred_depth_z[vis_mask]) / (1/vis_gt_depth[vis_mask])
absrel_map_mono = np.zeros(vis_gt_depth.shape)
absrel_map_mono[vis_mask] = np.abs(vis_gt_depth[vis_mask]-vis_pred_depth_mono[vis_mask]) / vis_gt_depth[vis_mask]
absrel_map_mono_disp = np.zeros(vis_gt_depth.shape)
absrel_map_mono_disp[vis_mask] = np.abs(1/vis_gt_depth[vis_mask]-1/vis_pred_depth_mono[vis_mask]) / (1/vis_gt_depth[vis_mask])
error_pil_multi = np.array(numpy_intensitymap_to_pcolor(absrel_map_multi[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:]
error_pil_mono = np.array(numpy_intensitymap_to_pcolor(absrel_map_mono[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:]
error_pil_multi_disp = np.array(numpy_intensitymap_to_pcolor(absrel_map_multi_disp[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:]
error_pil_mono_disp = np.array(numpy_intensitymap_to_pcolor(absrel_map_mono_disp[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:]
error_pil_mono = error_pil_mono[np.newaxis,:]
error_pil_multi = error_pil_multi[np.newaxis,:]
error_pil_mono_disp = error_pil_mono_disp[np.newaxis,:]
error_pil_multi_disp = error_pil_multi_disp[np.newaxis,:]
absrel_maps_mono.append(error_pil_mono)
absrel_maps_multi.append(error_pil_multi)
absrel_maps_mono_disp.append(error_pil_mono_disp)
absrel_maps_multi_disp.append(error_pil_multi_disp)
absrel_maps_mono = np.concatenate(absrel_maps_mono)
absrel_maps_multi = np.concatenate(absrel_maps_multi)
absrel_maps_mono_disp = np.concatenate(absrel_maps_mono_disp)
absrel_maps_multi_disp = np.concatenate(absrel_maps_multi_disp)
Hi @Cresynia, sorry for the confusion. It appears that some hyperparameters, such as scaling factors, are missing from the current code for the KITTI dataset. We'll review the code and provide an update soon. Thanks for your patience!
@Sungmin-Woo Thank you for your wonderful paper. I wonder if it has been updated in the repository in the part you mentioned.
훌륭하신 논문 적어주셔서 정말 감사합니다. 많이 배우겠습니다. 위에서 언급하신 KITTI 훈련 관련 업데이트가 저장소에 적용되었는지 궁금합니다. 감사합니다!
Below are the results of the model I trained myself and the results of the pretrained weight you provided. Without other changes, the model I trained myself with bs=12, #ofgpu=2 does not reproduce the results in the paper.
아래는 직접 훈련한 모델의 결과와 제공해주신 pretrained weight의 결과입니다. 다른 변경사항 없이 bs=12, # of gpu=2로 직접 훈련한 모델은 논문에서의 결과를 재현하지 못합니다.
Hi @757787182, thanks for your interest!
You can visualize the error maps using following codes in evaluate_depth.py. We use improved ground truth for the visualization and interpolate the GT map with cv2 dilation.
gt_masks = [] absrel_maps_mono = [] absrel_maps_multi = [] gt_disps = [] absrel_maps_mono_disp = [] absrel_maps_multi_disp = [] for i in tqdm.tqdm(range(len(gt_depths))): vis_gt_depth = gt_depths[i] vis_gt_height, vis_gt_width = vis_gt_depth.shape[:2] _pred_disp_z = np.squeeze(multi_pred_disps[i]) _pred_disp_mono = np.squeeze(mono_pred_disps[i]) vis_pred_disp_mono = cv2.resize(_pred_disp_mono, (vis_gt_width, vis_gt_height)) vis_pred_disp_z = cv2.resize(_pred_disp_z, (vis_gt_width, vis_gt_height)) vis_pred_depth_z = 1 / vis_pred_disp_z vis_pred_depth_mono = 1 / vis_pred_disp_mono if opt.eval_split == "eigen": mask = np.logical_and(vis_gt_depth > MIN_DEPTH, vis_gt_depth < MAX_DEPTH) crop = np.array([0.40810811 * vis_gt_height, 0.99189189 * vis_gt_height, 0.03594771 * vis_gt_width, 0.96405229 * vis_gt_width]).astype(np.int32) crop_mask = np.zeros(mask.shape) crop_mask[crop[0]:crop[1], crop[2]:crop[3]] = 1 mask = np.logical_and(mask, crop_mask) else: mask = vis_gt_depth > 0 pred_depth_z_mask = vis_pred_depth_z[mask] pred_depth_mono_mask = vis_pred_depth_mono[mask] gt_depth_mask = vis_gt_depth[mask] if not opt.disable_median_scaling: ratio1 = np.median(gt_depth_mask) / np.median(pred_depth_mono_mask) pred_depth_mono_mask *= ratio1 ratio2 = np.median(gt_depth_mask) / np.median(pred_depth_z_mask) pred_depth_z_mask *= ratio2 vis_pred_depth_mono *= ratio1 vis_pred_depth_z *= ratio2 vis_pred_depth_z[vis_pred_depth_z < MIN_DEPTH] = MIN_DEPTH vis_pred_depth_z[vis_pred_depth_z > MAX_DEPTH] = MAX_DEPTH vis_pred_depth_mono[vis_pred_depth_mono < MIN_DEPTH] = MIN_DEPTH vis_pred_depth_mono[vis_pred_depth_mono > MAX_DEPTH] = MAX_DEPTH kernel = np.ones((3, 3), np.uint8) vis_gt_depth = cv2.dilate(vis_gt_depth, kernel, iterations=1) vis_mask = vis_gt_depth > 0 absrel_map_multi = np.zeros(vis_gt_depth.shape) absrel_map_multi[vis_mask] = np.abs(vis_gt_depth[vis_mask]-vis_pred_depth_z[vis_mask]) / vis_gt_depth[vis_mask] absrel_map_multi_disp = np.zeros(vis_gt_depth.shape) absrel_map_multi_disp[vis_mask] = np.abs(1/vis_gt_depth[vis_mask]-1/vis_pred_depth_z[vis_mask]) / (1/vis_gt_depth[vis_mask]) absrel_map_mono = np.zeros(vis_gt_depth.shape) absrel_map_mono[vis_mask] = np.abs(vis_gt_depth[vis_mask]-vis_pred_depth_mono[vis_mask]) / vis_gt_depth[vis_mask] absrel_map_mono_disp = np.zeros(vis_gt_depth.shape) absrel_map_mono_disp[vis_mask] = np.abs(1/vis_gt_depth[vis_mask]-1/vis_pred_depth_mono[vis_mask]) / (1/vis_gt_depth[vis_mask]) error_pil_multi = np.array(numpy_intensitymap_to_pcolor(absrel_map_multi[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:] error_pil_mono = np.array(numpy_intensitymap_to_pcolor(absrel_map_mono[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:] error_pil_multi_disp = np.array(numpy_intensitymap_to_pcolor(absrel_map_multi_disp[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:] error_pil_mono_disp = np.array(numpy_intensitymap_to_pcolor(absrel_map_mono_disp[..., np.newaxis], vmin=0, vmax=0.5,colormap='jet', valid_mask=vis_mask))[:370,:1224,:] error_pil_mono = error_pil_mono[np.newaxis,:] error_pil_multi = error_pil_multi[np.newaxis,:] error_pil_mono_disp = error_pil_mono_disp[np.newaxis,:] error_pil_multi_disp = error_pil_multi_disp[np.newaxis,:] absrel_maps_mono.append(error_pil_mono) absrel_maps_multi.append(error_pil_multi) absrel_maps_mono_disp.append(error_pil_mono_disp) absrel_maps_multi_disp.append(error_pil_multi_disp) absrel_maps_mono = np.concatenate(absrel_maps_mono) absrel_maps_multi = np.concatenate(absrel_maps_multi) absrel_maps_mono_disp = np.concatenate(absrel_maps_mono_disp) absrel_maps_multi_disp = np.concatenate(absrel_maps_multi_disp)
Thank you for your wonderful reply。I couldn't find the single-frame variance trained in the code. Can you help me with that?
Hello, @jerry-ryu!
We are currently busy with another project, so it might take some time to finalize the code. We expect to update the training code that achieves the performance reported in the paper at least before the ECCV conference, so we kindly ask for your patience. We will leave a comment on this issue once the code is updated.
For now, we recommend conducting experiments on the Cityscapes dataset, where the performance gap is small.
@757787182
Thank you for your wonderful reply。I couldn't find the single-frame variance trained in the code. Can you help me with that?
The training code for the single-frame network is not provided, as it is nearly identical to the code provided for training the multi-frame network. You can predict the per-pixel variance of single-frame depth by simply modifying the reprojection loss as shown in Equation 4 of our paper.
Thank you for your great work! I encountered some issues when trying to reproduce the results. I used 1 GPU with a batch size of 12 on the KITTI dataset and utilized the provided KITTI checkpoints while freezing the teacher and pose. However, there is a significant discrepancy in the results. Could you please let me know what mistakes I might be making and what areas I should pay attention to. Thank you very much!