DepthAnything / Depth-Anything-V2

Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
https://depth-anything-v2.github.io
Apache License 2.0
3.39k stars 277 forks source link

训练和推理结果不一致原因? #164

Open tangjunjun966 opened 1 week ago

tangjunjun966 commented 1 week ago

作者或社区朋友们好,有一个疑惑问题请教:

以kitti数据假设,代码来源metric_train.py,我使用这个文件下训练得到深度结果depth = self.depth_head(features, patch_h, patch_w) * self.max_depth,然后与kitti数据深度直接做了SiLogLoss,kitti数据深度图直接除以256,代码如下: def getitem(self, item): img_path = self.filelist[item].split(' ')[0] depth_path = self.filelist[item].split(' ')[1]

    image = cv2.imread(img_path)
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) / 255.0

    depth = cv2.imread(depth_path, cv2.IMREAD_UNCHANGED).astype('float32')

    sample = self.transform({'image': image, 'depth': depth})

    sample['image'] = torch.from_numpy(sample['image'])
    sample['depth'] = torch.from_numpy(sample['depth'])
    sample['depth'] = sample['depth'] / 256.0  # convert in meters

    sample['valid_mask'] = sample['depth'] > 0

    sample['image_path'] = self.filelist[item].split(' ')[0]

    return sample

那么这样模型输出理论就是一个深度值,然而预测出来是近处距离大远处小,我直接使用官网提供预测代码,如下图:

image

然而推理使用如下代码:

def forward(self, x):
    patch_h, patch_w = x.shape[-2] // 14, x.shape[-1] // 14

    features = self.pretrained.get_intermediate_layers(x, self.intermediate_layer_idx[self.encoder], return_class_token=True)

    depth = self.depth_head(features, patch_h, patch_w)
    depth = F.relu(depth)

    return depth.squeeze(1)

理论应当就是深度值,和kitti的深度图一致,为何是这样结果?麻烦告知!