Well, after a month's study, I finnaly got the mistake point of the experiment.
It starts with the unmatched BIcubic upsampling baseline result:
I found this because when I use Bicubic to downsample then use Bicubic to upsample resulting in much better RMSE 8.07 than 14.22 (8x upsampling)
In the dataloader file, the authers use Bicubic to downsample the HR depth map to obtain LR input.
target = np.array(Image.fromarray(depth).resize((w//s,h//s),Image.BICUBIC).resize((w, h), Image.BICUBIC))
However, all other papers use nearest neighbor downsampling on NYU v2. See CVPR paper "Pixel-Adaptive Convolutional Neural Networks" for evidence.
This means the comparison is unfair.
Therefore, all the experiment results should be re-trained with:
target = np.array(Image.fromarray(depth).resize((w//s,h//s),Image.NEAREST).resize((w, h), Image.BICUBIC))
Well, after a month's study, I finnaly got the mistake point of the experiment.
It starts with the unmatched BIcubic upsampling baseline result: I found this because when I use Bicubic to downsample then use Bicubic to upsample resulting in much better RMSE 8.07 than 14.22 (8x upsampling) In the dataloader file, the authers use Bicubic to downsample the HR depth map to obtain LR input. target = np.array(Image.fromarray(depth).resize((w//s,h//s),Image.BICUBIC).resize((w, h), Image.BICUBIC))
However, all other papers use nearest neighbor downsampling on NYU v2. See CVPR paper "Pixel-Adaptive Convolutional Neural Networks" for evidence. This means the comparison is unfair. Therefore, all the experiment results should be re-trained with: target = np.array(Image.fromarray(depth).resize((w//s,h//s),Image.NEAREST).resize((w, h), Image.BICUBIC))