Hi, in the hf version, the code is:

`from transformers import AutoImageProcessor, AutoModelForDepthEstimation import torch import numpy as np from PIL import Image import requests

url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf") model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")

prepare image for the model

inputs = image_processor(images=image, return_tensors="pt")

with torch.no_grad(): outputs = model(**inputs) predicted_depth = outputs.predicted_depth

interpolate to original size

prediction = torch.nn.functional.interpolate( predicted_depth.unsqueeze(1), size=image.size[::-1], mode="bicubic", align_corners=False, ) `

Suppose the size of original image is 500, 600; The output prediction size is 600, 500; Is this a mistake to use image.size[::-1]?

DepthAnything / Depth-Anything-V2

Why does the hf example use image.size[::-1] for interpolate #158

prepare image for the model

interpolate to original size