LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
7.03k stars 540 forks source link

Passing single picture to metric depth evaluate.py #134

Open damavand1 opened 8 months ago

damavand1 commented 8 months ago

Hi

Based on this document I do every steps But i don't understand this part of document:
"Please follow ZoeDepth to prepare the training and test datasets." I don't found any dataset in ZoeDepth repo, I don't know where i can download test datasets.

so when i run this command python evaluate.py -m zoedepth --pretrained_resource="local::./checkpoints/depth_anything_metric_depth_outdoor.pt" -d kitti

I get this error: (depth_anything_metric) user1@pc:~/Desktop/Angel Vision/mes/Depth-Anything/metric_depth$ python evaluate.py -m zoedepth --pretrained_resource="local::./checkpoints/depth_anything_metric_depth_outdoor.pt" -d kitti {'attractor_alpha': 1000, 'attractor_gamma': 2, 'attractor_kind': 'mean', 'attractor_type': 'inv', 'aug': True, 'bin_centers_type': 'softplus', 'bin_embedding_dim': 128, 'clip_grad': 0.1, 'data_path': './data/Kitti/raw_data', 'data_path_eval': './data/Kitti/raw_data', 'dataset': 'kitti', 'degree': 1.0, 'distributed': True, 'do_kb_crop': True, 'do_random_rotate': True, 'eigen_crop': False, 'filenames_file': './train_test_inputs/kitti_eigen_train_files_with_gt.txt', 'filenames_file_eval': './train_test_inputs/kitti_eigen_test_files_with_gt.txt', 'garg_crop': True, 'gpu': None, 'gt_path': './data/Kitti/data_depth_annotated_zoedepth', 'gt_path_eval': './data/Kitti/data_depth_annotated_zoedepth', 'img_size': [392, 518], 'input_height': 352, 'input_width': 1216, 'inverse_midas': False, 'log_images_every': 0.1, 'max_depth': 80, 'max_depth_eval': 80, 'max_temp': 50.0, 'max_translation': 100, 'memory_efficient': True, 'midas_model_type': 'DPT_BEiT_L_384', 'min_depth': 0.001, 'min_depth_eval': 0.001, 'min_temp': 0.0212, 'model': 'zoedepth', 'n_attractors': [16, 8, 4, 1], 'n_bins': 64, 'name': 'ZoeDepth', 'notes': '', 'output_distribution': 'logbinomial', 'prefetch': False, 'pretrained_resource': 'local::./checkpoints/depth_anything_metric_depth_outdoor.pt', 'print_losses': False, 'project': 'ZoeDepth', 'random_crop': False, 'random_translate': False, 'root': '.', 'save_dir': './depth_anything_finetune', 'shared_dict': None, 'tags': '', 'train_midas': False, 'translate_prob': 0.2, 'uid': None, 'use_amp': False, 'use_pretrained_midas': False, 'use_right': False, 'use_shared_dict': False, 'validate_every': 0.25, 'version_name': 'v1', 'workers': 16} Evaluating zoedepth on kitti... xFormers not available xFormers not available Params passed to Resize transform: width: 518 height: 392 resize_target: True keep_aspect_ratio: False ensure_multiple_of: 14 resize_method: minimal Using pretrained resource local::./checkpoints/depth_anything_metric_depth_outdoor.pt Loaded successfully 0%| | 0/697 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/user1/Desktop/Angel Vision/mes/Depth-Anything/metric_depth/evaluate.py", line 159, in <module> eval_model(args.model, pretrained_resource=args.pretrained_resource, File "/home/user1/Desktop/Angel Vision/mes/Depth-Anything/metric_depth/evaluate.py", line 131, in eval_model metrics = main(config) File "/home/user1/Desktop/Angel Vision/mes/Depth-Anything/metric_depth/evaluate.py", line 115, in main metrics = evaluate(model, test_loader, config) File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/user1/Desktop/Angel Vision/mes/Depth-Anything/metric_depth/evaluate.py", line 71, in evaluate for i, sample in tqdm(enumerate(test_loader), total=len(test_loader)): File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__ for obj in iterable: File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 628, in __next__ data = self._next_data() File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1333, in _next_data return self._process_data(data) File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1359, in _process_data data.reraise() File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/_utils.py", line 543, in reraise raise exception FileNotFoundError: Caught FileNotFoundError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 58, in <listcomp> data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/user1/Desktop/Angel Vision/mes/Depth-Anything/metric_depth/zoedepth/data/data_mono.py", line 380, in __getitem__ image = np.asarray(self.reader.open(image_path), File "/home/user1/Desktop/Angel Vision/mes/Depth-Anything/metric_depth/zoedepth/data/data_mono.py", line 267, in open return Image.open(fpath) File "/home/user1/miniconda3/envs/depth_anything_metric/lib/python3.9/site-packages/PIL/Image.py", line 3227, in open fp = builtins.open(filename, "rb") FileNotFoundError: [Errno 2] No such file or directory: './data/Kitti/raw_data/2011_09_26/2011_09_26_drive_0002_sync/image_02/data/0000000069.png'

Another question is, I don't want to pass a dataset name (kitti | vkitti2 | diode_outdoor) as a parameter to evaluate.py , I want pass a single image to evaluate.py

how can i solve this two problem ?

Tank you

Eli-Mosi commented 8 months ago

Hi, I have the same question, how to pass a single image for metric depth evaluation ?

LiheYoung commented 7 months ago

If you want to test your own images with the metric depth model, please refer to this file: https://github.com/LiheYoung/Depth-Anything/blob/main/metric_depth/depth_to_pointcloud.py.

From this file, you can see the model loading part, the image pre-processing part, and the model inference part. Please simply ignore the other parts in the file which are used for point cloud visualization.

AbbosAbdullayev commented 7 months ago

Hi, @LiheYoung thank you for the contribution you made, The prediction size and input image shape are different, how can we map the prediction results to the input image, is it working to resize the input image to match the output size? thank you for your response

Dileepvk98 commented 3 months ago

If you want to test your own images with the metric depth model, please refer to this file: https://github.com/LiheYoung/Depth-Anything/blob/main/metric_depth/depth_to_pointcloud.py.

From this file, you can see the model loading part, the image pre-processing part, and the model inference part. Please simply ignore the other parts in the file which are used for point cloud visualization.


edit : hi im trying to infer on a single image/video. this is the code i have now.

my objective is to measure the distance/depth at which a detected object is present and then calculate its height/width from its bounding box with the help of the depth found. (im using mediapipe here to detect a person hand and later calculate the persons height for example) the code i have right now is this code

though i have no idea what this part of the code from evaluate.py does do i need this part ?

focal_length_x, focal_length_y = (FX, FY) if not NYU_DATA else (FL, FL)
x, y = np.meshgrid(np.arange(FINAL_WIDTH), np.arange(FINAL_HEIGHT))
x = (x - FINAL_WIDTH / 2) / focal_length_x
y = (y - FINAL_HEIGHT / 2) / focal_length_y
z = np.array(resized_pred)
points = np.stack((np.multiply(x, z), np.multiply(y, z), z), axis=-1)#.reshape(-1, 3)
print(points.shape)