Open letdivedeep opened 2 years ago
Note that kitti_gt depth is sparse, so that "obstacle_depth = depth_map[y1:y2, x1:x2]" has many 0 values. You should exclude them first, and then find the min() or mean() value.
@JiawangBian thanks for your inputs. it Worked !!
@JiawangBian I used the kitti pretrained model to infer on some images using the inference.sh script
The depth map were stored in the .npy dir structure. we read the file in the following way
gt_depth_map = np.load("0044.npy")
Now to get the distance from this map what should be done, do we have to use the focal length and baseline formulation for it or we can directly lay the bbox on it and retrieve the distances byt min or mean
Do not need to use the focal length and baseline. However, you need to know the scaling ratio between the predicted depth and ground truth. The monocular depth estimation is up to an unknown scale, so you need to recover it from an external source. Fortunately, the scale-consistent depth method ensures that our predicted depths on all images have the same scale. It means that you can recover the scale by using one image (where you have ground truth and you can compute median scaling, like in evaluation code), and then you can apply this scale on all other images.
@JiawangBian Thanks for your quick reply
I checked the test.py and found that an function compute_errors is called to calculate the errors
for i, (tgt_img, gt_depth) in enumerate(tqdm(test_loader)):
pred_depth = model.inference_depth(tgt_img.cuda())
errs = compute_errors(gt_depth.cuda(), pred_depth,
hparams.dataset_name)
all_errs.append(np.array(errs))
all_errs = np.stack(all_errs)
mean_errs = np.mean(all_errs, axis=0)
which call the compute_error method
def compute_errors(gt, pred, dataset):
# pred : b c h w
# gt: b h w
abs_diff = abs_rel = sq_rel = log10 = rmse = rmse_log = a1 = a2 = a3 = 0.0
batch_size, h, w = gt.size()
if pred.nelement() != gt.nelement():
pred = F.interpolate(pred, [h, w], mode='nearest')
pred = pred.view(batch_size, h, w)
if dataset == 'kitti':
crop_mask = gt[0] != gt[0]
y1, y2 = int(0.40810811 * gt.size(1)), int(0.99189189 * gt.size(1))
x1, x2 = int(0.03594771 * gt.size(2)), int(0.96405229 * gt.size(2))
crop_mask[y1:y2, x1:x2] = 1
max_depth = 80
if dataset == 'nyu':
crop_mask = gt[0] != gt[0]
crop = np.array([45, 471, 41, 601]).astype(np.int32)
crop_mask[crop[0]:crop[1], crop[2]:crop[3]] = 1
max_depth = 10
if dataset == 'ddad':
crop_mask = gt[0] != gt[0]
crop_mask[:, :] = 1
max_depth = 200
min_depth = 1e-3
for current_gt, current_pred in zip(gt, pred):
valid = (current_gt > min_depth) & (current_gt < max_depth)
valid = valid & crop_mask
valid_gt = current_gt[valid]
valid_pred = current_pred[valid]
# align scale
valid_pred = valid_pred * \
torch.median(valid_gt)/torch.median(valid_pred)
valid_pred = valid_pred.clamp(min_depth, max_depth)
thresh = torch.max((valid_gt / valid_pred), (valid_pred / valid_gt))
a1 += (thresh < 1.25).float().mean()
a2 += (thresh < 1.25 ** 2).float().mean()
a3 += (thresh < 1.25 ** 3).float().mean()
diff_i = valid_gt - valid_pred
abs_diff += torch.mean(torch.abs(diff_i))
abs_rel += torch.mean(torch.abs(diff_i) / valid_gt)
sq_rel += torch.mean(((diff_i)**2) / valid_gt)
rmse += torch.sqrt(torch.mean(diff_i ** 2))
rmse_log += torch.sqrt(torch.mean((torch.log(valid_gt) -
torch.log(valid_pred)) ** 2))
log10 += torch.mean(torch.abs((torch.log10(valid_gt) -
torch.log10(valid_pred))))
return [metric.item() / batch_size for metric in [abs_diff, abs_rel, sq_rel, log10, rmse, rmse_log, a1, a2, a3]]
when i run the test eval script i get the following output
A bit confused on two parts 1) What do you mean by median scaling.. is it this part
# align scale
valid_pred = valid_pred * torch.median(valid_gt)/torch.median(valid_pred)
valid_pred = valid_pred.clamp(min_depth, max_depth)
2) How can this median scale be applied all other images (do we need to perform the same operation on predicted depthmap and than get the closest bbox distances )
@JiawangBian I tried with above approch as illustrated in the following code base :
def find_distances(gt_depth_map,pt_depth_map, pred_bboxes, img, method="closest"):
depth_list = []
h, w, _ = img.shape
for box in pred_bboxes:
x1 = int(box[0]*w - box[2]*w*0.5) # center_x - width /2
y1 = int(box[1]*h-box[3]*h*0.5) # center_y - height /2
x2 = int(box[0]*w + box[2]*w*0.5) # center_x + width/2
y2 = int(box[1]*h+box[3]*h*0.5) # center_y + height/2
pt_obstacle_depth = pt_depth_map[y1:y2, x1:x2]
gt_obstacle_depth = gt_depth_map[y1:y2, x1:x2]
# Remove the 0's
pt_obstacle_depth = pt_obstacle_depth[pt_obstacle_depth != 0]
gt_obstacle_depth = gt_obstacle_depth[gt_obstacle_depth != 0]
# Convert numpy array to tensor
pt_depth_tensor = torch.from_numpy(pt_obstacle_depth)
gt_depth_tensor = torch.from_numpy(gt_obstacle_depth)
# perform the median scaling on predicted depth
valid_pred = pt_depth_tensor * torch.median(gt_depth_tensor)/torch.median(pt_depth_tensor)
if method=="closest":
depth_list.append(valid_pred.min()) # take the closest point in the box
print("closed point :",valid_pred.min())
elif method=="average":
depth_list.append(np.mean(valid_pred)) # take the average
elif method=="median":
depth_list.append(np.median(valid_pred)) # take the median
else:
depth_list.append(pt_obstacle_depth[int(box[1]*h)][int(box[0]*w)]) # take the center
return depth_list
When compared with the GT from KT there was 5 -6m Kitti Ground Truth sc-depth model prediction
@JiawangBian your inputs may be helpful
@JiawangBian Thanks for the wonderful work !!
I wanted to get the absolute distances for objects from the Kitti GT depth map provided. I have downloaded the kitti raw dataset provided in the repo
To load the kitti GT depth map used the following code
the resulted out is as shown below
Than to get the bbox used an Yolov4 model
Overlayed the bbox on the depth map and took the depth value from the center point as shown
the resulted out is this where we get 0m distances
Do we have to any other pre-processing prior to using the kitti gt depth maps ?