bad metric depth resuls when using huggingface transformers pipeline

MoAbbasid commented 3 months ago

Hi, Im trying to infer the model named "depth-anything/Depth-Anything-V2-Metric-Outdoor-Small-hf" from huggingface, but the depth map produced is way off and much worse than when using the model at "depth-anything/Depth-Anything-V2-Small-hf"

result from the metric outdoor model:

pipe = pipeline("depth-estimation", model="depth-anything/Depth-Anything-V2-Metric-Indoor-Small-hf")

result from the basic model:

pipe = pipeline("depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf")

what could be the reason for the difference?

Im trying to generate depth values and I dont have a dataset to finetune the model on, but I do have the camera parameters, How do I input those into the model to help generate better Metric depth values?

LeDat98 commented 3 months ago

try this code

from transformers import pipeline
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import cv2
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Use: {device}")

pipe = pipeline(task="depth-estimation", model="depth-anything/Depth-Anything-V2-Small-hf", device=device)

input_image_path = '/home/leducdat/Development/yolov8/Depth-Anything-V2/assets/examples/london_image.png'

image = Image.open(input_image_path)

depth_result = pipe(image)

depth_image = depth_result["depth"]

depth_array = np.array(depth_image)

depth_normalized = ((depth_array - depth_array.min()) / (depth_array.max() - depth_array.min()) * 255).astype(np.uint8)

cmap = plt.get_cmap('Spectral_r')

depth_colored = (cmap(depth_normalized)[:, :, :3] * 255).astype(np.uint8)

depth_colored_bgr = cv2.cvtColor(depth_colored, cv2.COLOR_RGB2BGR)

output_image_path = 'depth_image_spectral_r4.png'
cv2.imwrite(output_image_path, depth_colored_bgr)

print(f"depth image saved : {output_image_path}")

raw_image = cv2.imread(input_image_path)
split_region = np.ones((raw_image.shape[0], 50, 3), dtype=np.uint8) * 255
combined_result = cv2.hconcat([raw_image, split_region, depth_colored_bgr])

combined_output_path = 'combined_depth_image4.png'
cv2.imwrite(combined_output_path, combined_result)

print(f"combined image saved : {combined_output_path}")

MoAbbasid commented 2 months ago

do we want depth? or predicted depth?

ok I figured this part out, predicted_depth is the depth map before unsqueezing and normalizing

DepthAnything / Depth-Anything-V2

bad metric depth resuls when using huggingface transformers pipeline #141