DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
https://depth-anything-v2.github.io
Apache License 2.0
3.81k stars 324 forks source link

Why does the hf example use image.size[::-1] for interpolate #158

Open ZihaoZheng98 opened 1 month ago

ZihaoZheng98 commented 1 month ago

Hi, in the hf version, the code is:

`from transformers import AutoImageProcessor, AutoModelForDepthEstimation import torch import numpy as np from PIL import Image import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf") model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")

prepare image for the model

inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad(): outputs = model(**inputs) predicted_depth = outputs.predicted_depth

interpolate to original size

prediction = torch.nn.functional.interpolate( predicted_depth.unsqueeze(1), size=image.size[::-1], mode="bicubic", align_corners=False, ) `

Suppose the size of original image is 500, 600; The output prediction size is 600, 500; Is this a mistake to use image.size[::-1]?

LiheYoung commented 1 month ago

Hi Zihao, the variable image is in the form of PIL.Image. So its size is a tuple of (w, h), rather than the wanted (h, w).