ethz-asl / TULIP

MIT License
23 stars 3 forks source link

Consult for the MAE metric values? #3

Open FangzhouTang opened 1 month ago

FangzhouTang commented 1 month ago

Have the values of the MAE metric in the experiments described in the paper been scaled? The actual values obtained are very small, with the MAE errors in some other papers also being in the range of 0.0xx.

binyang97 commented 1 month ago

Hi,

yes, the depth values for evaluating MAE in the code are normalized by the maximum range, so ´they are always in range of 0 to 1. In the paper, all values are in their actual range.

Best, Bin

FangzhouTang commented 1 month ago

Thank you so much for your reply!. I noticed the following wriring in the code: """if args.dataset_select == "carla": pred_img = torch.where((pred_img >= 2/80) & (pred_img <= 1), pred_img, 0) elif args.dataset_select == "durlar": pred_img = torch.where((pred_img >= 0.3/120) & (pred_img <= 1), pred_img, 0) elif args.dataset_select == "kitti": pred_img = torch.where((pred_img >= 0) & (pred_img <= 1), pred_img, 0) else: print("Not Preprocess the pred image")""" I have two questions. First, why is it necessary to apply different preprocessing steps for different datasets? In my view, it should be sufficient to simply compare the predicted image with the ground truth. Second, in the code, is it enough to delete the preprocessing lines above in order to obtain the actual values?

binyang97 commented 1 month ago

Thank you so much for your reply!. I noticed the following wriring in the code: """if args.dataset_select == "carla": pred_img = torch.where((pred_img >= 2/80) & (pred_img <= 1), pred_img, 0) elif args.dataset_select == "durlar": pred_img = torch.where((pred_img >= 0.3/120) & (pred_img <= 1), pred_img, 0) elif args.dataset_select == "kitti": pred_img = torch.where((pred_img >= 0) & (pred_img <= 1), pred_img, 0) else: print("Not Preprocess the pred image")""" I have two questions. First, why is it necessary to apply different preprocessing steps for different datasets? In my view, it should be sufficient to simply compare the predicted image with the ground truth. Second, in the code, is it enough to delete the preprocessing lines above in order to obtain the actual values?

Hi Fangzhou,

sorry for the late reply, somehow I miss the notification.

For your first question, the range image is always normalized in range of (0, 1) and then transformed to a logarithmic space. This is simply because the dataset is recorded by sensors with different configurations, so the maximum and minimum range are different to each other. By normalization, I can unify the range regardless of that difference. It would be also easier to test the generalization ability in the cross-dataset case, like training on carla and testing on durlar.

For your second question, since the network is trained on the normalized version, just commenting them out would not give you the correct output. But you can scale them back to the original range in the post-processing step before evaluating them. The back transformation from the logarithmic space is already there so you don't need to add any code for that.

Best, Bin