Open ZihaoZheng98 opened 1 month ago
Hi, in the hf version, the code is:
`from transformers import AutoImageProcessor, AutoModelForDepthEstimation import torch import numpy as np from PIL import Image import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf") model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad(): outputs = model(**inputs) predicted_depth = outputs.predicted_depth
prediction = torch.nn.functional.interpolate( predicted_depth.unsqueeze(1), size=image.size[::-1], mode="bicubic", align_corners=False, ) `
Suppose the size of original image is 500, 600; The output prediction size is 600, 500; Is this a mistake to use image.size[::-1]?
Hi Zihao, the variable image is in the form of PIL.Image. So its size is a tuple of (w, h), rather than the wanted (h, w).
image
PIL.Image
(w, h)
(h, w)
Hi, in the hf version, the code is:
`from transformers import AutoImageProcessor, AutoModelForDepthEstimation import torch import numpy as np from PIL import Image import requests
url = "http://images.cocodataset.org/val2017/000000039769.jpg" image = Image.open(requests.get(url, stream=True).raw)
image_processor = AutoImageProcessor.from_pretrained("LiheYoung/depth-anything-large-hf") model = AutoModelForDepthEstimation.from_pretrained("LiheYoung/depth-anything-large-hf")
prepare image for the model
inputs = image_processor(images=image, return_tensors="pt")
with torch.no_grad(): outputs = model(**inputs) predicted_depth = outputs.predicted_depth
interpolate to original size
prediction = torch.nn.functional.interpolate( predicted_depth.unsqueeze(1), size=image.size[::-1], mode="bicubic", align_corners=False, ) `
Suppose the size of original image is 500, 600; The output prediction size is 600, 500; Is this a mistake to use image.size[::-1]?