Request evaluation code sharing

caodinhduc commented 6 months ago

Hello, thank for your great work. we are researching on the same topic, To make the fair comparison, can I ask you the evaluation code of your model? we will cite your paper appropriately.

Sbrunoberenguel commented 6 months ago

Hello, I'm happy to hear that you like our work and want to compare yours with ours. To obtain the metrics presented in our paper, we used the metrics of OmniDepth. You can find the code in: https://github.com/meder411/OmniDepth-PyTorch/tree/master The evaluation was straight forward: make the inference (code in our repo) in the test-set of Stanford datasets (in the Area 5) and then compare with ground truth information with the metrics of OmniDepth. For semantic segmentation, we used the library Ignite-pytorch and computed a confusion matrix with the 14 classes. Then, to compute the metrics, we drop the unknown class and compute the mean over the rest of the classes.

caodinhduc commented 6 months ago

Thank for your guides, I will try to reproduce it

caodinhduc commented 6 months ago

I follow your guides, with metric as your provided link: import os import cv2 import time import math import torch import argparse import warnings import numpy as np from tqdm import tqdm import matplotlib.pyplot as plt

import FreDSNet_model as model warnings.filterwarnings('ignore') from metric import *

color_code = [ [0,0,0], #UNK [100,0,0], #beam [0,0,100], #board [255,0,0], #bookcase [123,123,255], #ceiling [255,123,123], #chair [200,200,200], #clutter [0,100,0], #column [100,220,100], #door [123,255,123], #floor [0,0,255], #sofa [0,255,0], #table [50,30,100], #wall [200,200,220]] #window

def color_segmentation(seg): H,W = seg.shape cseg = seg.reshape(-1,1) out = np.zeros((HW,3)) for i in range(HW): out[i] = color_code[int(cseg[i])] return out.reshape(H,W,3)

def decode(img,d_max): img = img255 if img.max() < 1.1 else img R,G,B = img[:,0],img[:,1],img[:,2] int1 = d_max/255.0 int2 = (d_max/255.0)/255.0 d1 = (Rd_max)/255.0 d2 = (G/255.0)int1 d3 = (B/255.0)int2 return d1+d2+d3

class AverageMeter(object): """Computes and stores the average and current value"""

def __init__(self):
    self.reset()

def reset(self):
    self.val = 0
    self.avg = 0
    self.sum = 0
    self.count = 0

def update(self, val, n=1):
    self.val = val
    self.sum += val * n
    self.count += n
    self.avg = self.sum / self.count

def to_dict(self):
    return {
        'val': self.val,
        'sum': self.sum,
        'count': self.count,
        'avg': self.avg
    }

def from_dict(self, meter_dict):
    self.val = meter_dict['val']
    self.sum = meter_dict['sum']
    self.count = meter_dict['count']
    self.avg = meter_dict['avg']

class Evaluator(object):

def __init__(self):
    # Accuracy metric trackers
    self.abs_rel_error_meter = AverageMeter()
    self.sq_rel_error_meter = AverageMeter()
    self.lin_rms_sq_error_meter = AverageMeter()
    self.log_rms_sq_error_meter = AverageMeter()
    self.d1_inlier_meter = AverageMeter()
    self.d2_inlier_meter = AverageMeter()
    self.d3_inlier_meter = AverageMeter()

def reset_eval_metrics(self):
    '''
    Resets metrics used to evaluate the model
    '''
    self.abs_rel_error_meter.reset()
    self.sq_rel_error_meter.reset()
    self.lin_rms_sq_error_meter.reset()
    self.log_rms_sq_error_meter.reset()
    self.d1_inlier_meter.reset()
    self.d2_inlier_meter.reset()
    self.d3_inlier_meter.reset()

def compute_eval_metrics(self, depth_pred, gt_depth, depth_mask):
    '''
    Computes metrics used to evaluate the model
    '''
    N = depth_mask.sum()

    # Align the prediction scales via median
    median_scaling_factor = gt_depth[depth_mask > 0].median() / depth_pred[
        depth_mask > 0].median()
    depth_pred *= median_scaling_factor

    abs_rel = abs_rel_error(depth_pred, gt_depth, depth_mask)
    sq_rel = sq_rel_error(depth_pred, gt_depth, depth_mask)
    rms_sq_lin = lin_rms_sq_error(depth_pred, gt_depth, depth_mask)
    rms_sq_log = log_rms_sq_error(depth_pred, gt_depth, depth_mask)
    d1 = delta_inlier_ratio(depth_pred, gt_depth, depth_mask, degree=1)
    d2 = delta_inlier_ratio(depth_pred, gt_depth, depth_mask, degree=2)
    d3 = delta_inlier_ratio(depth_pred, gt_depth, depth_mask, degree=3)

    self.abs_rel_error_meter.update(abs_rel, N)
    self.sq_rel_error_meter.update(sq_rel, N)
    self.lin_rms_sq_error_meter.update(rms_sq_lin, N)
    self.log_rms_sq_error_meter.update(rms_sq_log, N)
    self.d1_inlier_meter.update(d1, N)
    self.d2_inlier_meter.update(d2, N)
    self.d3_inlier_meter.update(d3, N)

def print_validation_report(self):
    '''
    Prints a report of the validation results
    '''
    print('  Avg. Abs. Rel. Error: {:.4f}\n'
          '  Avg. Sq. Rel. Error: {:.4f}\n'
          '  Avg. Lin. RMS Error: {:.4f}\n'
          '  Avg. Log RMS Error: {:.4f}\n'
          '  Inlier D1: {:.4f}\n'
          '  Inlier D2: {:.4f}\n'
          '  Inlier D3: {:.4f}\n\n'.format(
              self.abs_rel_error_meter.avg,
              self.sq_rel_error_meter.avg,
              math.sqrt(self.lin_rms_sq_error_meter.avg),
              math.sqrt(self.log_rms_sq_error_meter.avg),
              self.d1_inlier_meter.avg, self.d2_inlier_meter.avg,
              self.d3_inlier_meter.avg))

if name == 'main':

parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
parser.add_argument('--pth', required=False, default='ckpt/FreDSNet_weights.pth',
                    help='path to load saved checkpoint.')
parser.add_argument('--root_dir', required=False, default='Example')
parser.add_argument('--out_dir',  required=False, default='Results')
parser.add_argument('--no_depth',    required=False, action='store_true',default=False)
parser.add_argument('--no_semantic', required=False, action='store_true',default=False)
parser.add_argument('--no_cuda', action='store_true')
args = parser.parse_args()
#PARSER END#

device = torch.device('cpu' if args.no_cuda else 'cuda')
print('Inference made with: {}\n'.format(device))

net,state_dict = model.load_weigths(args)
# net.param_count_sections()
net.to(device)

num_classes = net.num_classes
scale = 2

print('Results for FreDSNet')
net.eval()

img_list = os.listdir(args.root_dir)

# Inferencing   
accum_time = 0
os.makedirs(args.out_dir,exist_ok=True)
os.makedirs(os.path.join(args.out_dir,'semantic'),exist_ok=True)
os.makedirs(os.path.join(args.out_dir,'depth'),exist_ok=True)
os.makedirs(os.path.join(args.out_dir,'depthmap'),exist_ok=True)

depth_root = 'Depth'
metric = Evaluator()
metric.reset_eval_metrics()

for name in tqdm(img_list):
    img_path = os.path.join(args.root_dir,name)
    depth_path = os.path.join(depth_root, name.replace("_rgb.png", "_depth.png"))

    H, W = 512//scale,1024//scale
    img = cv2.resize(cv2.cvtColor(cv2.imread(img_path),cv2.COLOR_BGR2RGB),(W,H),cv2.INTER_CUBIC)
    img = np.array(img,np.float32)[...,:3] / 255.
    i_img_mask = np.logical_and(img[...,0]==0,img[...,1]==0,img[...,2]==0)*1
    img_mask = np.ones_like(i_img_mask) - i_img_mask
    x_img = torch.FloatTensor(img.transpose([2, 0, 1]).copy())
    x = x_img.unsqueeze(0)
    with torch.no_grad():
        t_start = time.time()
        output = net(x.to(device))    
        t_end = time.time()
    inf_time = (t_end - t_start)
    depth = output['Depth']
    pred_depth = depth.cpu().numpy().astype(np.float32).squeeze(0).squeeze(0)
    semantic = output['Semantic'].cpu().squeeze(0)
    accum_time += inf_time

    #Output management
    pred_sem = torch.argmax(semantic,dim=0).numpy()
    pred_sem = color_segmentation(pred_sem) + 0.25*img*255.

    #Read gt depth
    gt_depth = cv2.imread(depth_path, -1)
    gt_depth = gt_depth.astype(np.float32)/512
    gt_depth = cv2.resize(gt_depth, dsize=(512, 256), interpolation=cv2.INTER_NEAREST)
    gt_depth = torch.from_numpy(gt_depth)

    # mask includes blank area and depth gt out of range 
    mask = (gt_depth > 0) & (gt_depth <= 10.0) & ~torch.isnan(gt_depth)
    mask = mask * torch.from_numpy(img_mask)

    metric.compute_eval_metrics( torch.from_numpy(pred_depth), gt_depth, mask)
    #Save coded data
    # cv2.imwrite(os.path.join(args.out_dir,'semantic',name[:-4]+'_seg.png'),pred_sem*img_mask.reshape(H,W,1))
    # np.save(os.path.join(args.out_dir,'depth',name[:-4]+'.npy'),pred_depth*img_mask)
    # plt.figure(0)
    # plt.imshow(pred_depth*img_mask)
    # plt.savefig(os.path.join(args.out_dir,'depthmap',name[:-4]+'_dep.png'))

print('Total inference time: %.2f' %accum_time)
print('Frames per second at 256 x 512 : %.2f' %(len(img_list)/accum_time))
metric.print_validation_report()

and the quantitative results is quite far from in the paper, what should I do? can you provide to me any insight? Total inference time: 12.79 Frames per second at 256 x 512 : 29.17 Avg. Abs. Rel. Error: 0.1131 Avg. Sq. Rel. Error: 0.0791 Avg. Lin. RMS Error: 0.4587 Avg. Log RMS Error: 0.1836 Inlier D1: 0.8815 Inlier D2: 0.9695 Inlier D3: 0.9884

Sbrunoberenguel commented 6 months ago

I see that you have scaled with the median value of ground truth and prediction in your code (that's why most of the metrics are better than in the original paper). In my implementation, since I assume that I do not know the ground truth data at inference time, I do not scale the prediction of the network. Even if it gets better results for most metrics, I do not believe is a good way to compare the performance of neural networks for tasks where we do not have ground truth information as input to the network.

Try not to scale the prediction and the metrics should be similar to the original paper from ICRA'23.

caodinhduc commented 6 months ago

It will be: Total inference time: 12.54 Frames per second at 256 x 512 : 29.74 Avg. Abs. Rel. Error: 0.1331 Avg. Sq. Rel. Error: 0.0946 Avg. Lin. RMS Error: 0.5182 Avg. Log RMS Error: 0.2080 Inlier D1: 0.8432 Inlier D2: 0.9586 Inlier D3: 0.9864 It matches with report in ICRA paper but: I think I understand with almost metrics except RMSE, as shown in the paper RMSE is 0.27, compared to 0.4 of HoHoNet, while inlier D1,2,3 of HoHoNet is much higher, it make me some confusion of how to reproduce this RMSE.

Sbrunoberenguel commented 6 months ago

To compare with other SOTA methods, RMSE does not compute the square root (I see that you do it). I did it at the time because it was how other methods used this metric. If you apply this to your results, you will get it (RMSE = 0.5182^2 = 0.2747 (aprox))

caodinhduc commented 6 months ago

I understood, thank you for your support

Sbrunoberenguel / FreDSNet

Request evaluation code sharing #6