LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.38k stars 490 forks source link

true metric depth values #36

Open abhishekmonogram opened 5 months ago

abhishekmonogram commented 5 months ago

Hi @LiheYoung ,

This is super impressive work. I used the huggingface deployment to test out the network. I gave it a sample image from a camera with known camera intrinsics and it output a depth map(consider it as disparity as it says on huggingface). I see per pixel values of the depth/disparity map but I do not know how to go about extracting per pixel true metric depth from these. Are the depth maps relative or are they true metric? If they are true metric, then how can I go about extracting per pixel metric depth?

1ssb commented 4 months ago

Ok found the issue. Go through my code once again and check if you need additional resizing and adapt the focal length scale. Also the depth map needs to be compared to a uint16 format in an array AFTER BACK PROJECTION.

My suggestion is save an npy file out of the prediction after concatenation with the rgb and depth prediction. Now compare for both the GT RGBD and the prediction RGBD without the BACK PROJECTION. Currently you are comparing the transformed with the untransformed.

karantai commented 3 months ago

Congratulations for the great effort, all of you!

Something worth mentioning here. @1ssb in the depth_to_pointcloud.py script at the lines 55 and 56 where the image plane coordinates are transformed to camera coordinates, would not be better if instead of this

            x = (x - FINAL_WIDTH / 2) / focal_length_x
            y = (y - FINAL_HEIGHT / 2) / focal_length_y

we have this:

            x = (x - CX) / focal_length_x
            y = (y - CY) / focal_length_y

where CX and CY are the pixel coordinates of the principal point (in pixels) ? The principal point is not exactly in the middle of the image plane so maybe this change increase the accuracy overall. Correct me if I am wrong

1ssb commented 3 months ago

Yes, you are right. Feel free to change that as it does not affect anything else. For my implementation I ended up always using square images and hence such a simplification.

Updated the code on this page, not sending a new merge request for this though. @LiheYoung can you make this minor update, I believe it would indeed be helpful to people.

On Fri, 15 Mar, 2024, 2:09 am karantai, @.***> wrote:

Congratulations for the great effort, all of you!

Something worth mentioning here. @1ssb https://github.com/1ssb in the depth_to_pointcloud.py script at the lines 55 and 56 where the image plane coordinates are transformed to camera coordinates, would not be better if instead of this

        x = (x - FINAL_WIDTH / 2) / focal_length_x
        y = (y - FINAL_HEIGHT / 2) / focal_length_y

we have this:

        x = (x - CX) / focal_length_x
        y = (y - CY) / focal_length_y

where CX and CY are the pixel coordinates of the principal point (in pixels) ? The principal point is not exactly in the middle of the image plane so maybe this change increase the accuracy overall. Correct me if I am wrong

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-1997680338, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEEQJBJZ2OOL2HJSPXDYYG4SLAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGY4DAMZTHA . You are receiving this because you were mentioned.Message ID: @.***>

karantai commented 3 months ago

So, the depth_to_pointcloud.py generates a point cloud, which takes into account both scale and focal length into account in order for the depth metric to be "truly" correct, like the work that have been done here?

1ssb commented 3 months ago

No, in the work the focal params are also learnt and predicted, here you know the intrinsics already.

csyhping commented 3 months ago

Yes, you are right. Feel free to change that as it does not affect anything else. For my implementation I ended up always using square images and hence such a simplification. Updated the code on this page, not sending a new merge request for this though. @LiheYoung can you make this minor update, I believe it would indeed be helpful to people. On Fri, 15 Mar, 2024, 2:09 am karantai, @.> wrote: Congratulations for the great effort, all of you! Something worth mentioning here. @1ssb https://github.com/1ssb in the depth_to_pointcloud.py script at the lines 55 and 56 where the image plane coordinates are transformed to camera coordinates, would not be better if instead of this x = (x - FINAL_WIDTH / 2) / focal_length_x y = (y - FINAL_HEIGHT / 2) / focal_length_y we have this: x = (x - CX) / focal_length_x y = (y - CY) / focal_length_y where CX and CY are the pixel coordinates of the principal point (in pixels) ? The principal point is not exactly in the middle of the image plane so maybe this change increase the accuracy overall. Correct me if I am wrong — Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEEQJBJZ2OOL2HJSPXDYYG4SLAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGY4DAMZTHA . You are receiving this because you were mentioned.Message ID: @.>

@1ssb Thanks for your work. I have several questions.

  1. I wanna ask why P_x and P_y are set to 128, I guess it should be the actual value of the principle point?
  2. And since it could handle any output size, but it seems the original_width and original_height are not used, I guess it should replace FINAL_WIDTHand FINAL_WEIGHT?
  3. I checked the edited history, it seems you also modified FX and FY, may I ask why?
  4. I tried to change FINAL_WIDTHand FINAL_WEIGHT with different values or replace them with original_width and original_height , but the result is weird and the pointcloud is heavily squeezed. May I ask why and if there are any limitations on modifying the final resolution(i.e., I should modify several parameters simultaneously )

Thanks!

1ssb commented 3 months ago

I did it for my own implementation. All you need to do is update it as per your requirements.

On Mon, 18 Mar, 2024, 7:32 pm Fyphia, @.***> wrote:

Yes, you are right. Feel free to change that as it does not affect anything else. For my implementation I ended up always using square images and hence such a simplification. Updated the code on this page, not sending a new merge request for this though. @LiheYoung https://github.com/LiheYoung can you make this minor update, I believe it would indeed be helpful to people. … <#m248222751019461232> On Fri, 15 Mar, 2024, 2:09 am karantai, @.*> wrote: Congratulations for the great effort, all of you! Something worth mentioning here. @1ssb https://github.com/1ssb https://github.com/1ssb https://github.com/1ssb in the depth_to_pointcloud.py script at the lines 55 and 56 where the image plane coordinates are transformed to camera coordinates, would not be better if instead of this x = (x - FINAL_WIDTH / 2) / focal_length_x y = (y

@1ssb https://github.com/1ssb Thanks for your work. I have several questions.

  1. I wanna ask why P_x and P_y are set to 128, I guess it should be the actual value of the principle point?
  2. And since it could handle any output size, but it seems the original_width and original_height are not used, I guess it should replace FINAL_WIDTH and FINAL_WEIGHT ?
  3. I checked the edited history, it seems you also modified FX and FY, may I ask why?

Thanks!

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-2003194024, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDX2QCBHCF3VEKQCVTYY2RDHAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTGE4TIMBSGQ . You are receiving this because you were mentioned.Message ID: @.***>

csyhping commented 3 months ago

I did it for my own implementation. All you need to do is update it as per your requirements. On Mon, 18 Mar, 2024, 7:32 pm Fyphia, @.> wrote: Yes, you are right. Feel free to change that as it does not affect anything else. For my implementation I ended up always using square images and hence such a simplification. Updated the code on this page, not sending a new merge request for this though. @LiheYoung https://github.com/LiheYoung can you make this minor update, I believe it would indeed be helpful to people. … <#m248222751019461232> On Fri, 15 Mar, 2024, 2:09 am karantai, @.> wrote: Congratulations for the great effort, all of you! Something worth mentioning here. @1ssb https://github.com/1ssb https://github.com/1ssb https://github.com/1ssb in the depth_to_pointcloud.py script at the lines 55 and 56 where the image plane coordinates are transformed to camera coordinates, would not be better if instead of this x = (x - FINAL_WIDTH / 2) / focal_length_x y = (y - FINAL_HEIGHT / 2) / focal_length_y we have this: x = (x - CX) / focal_length_x y = (y - CY) / focal_length_y where CX and CY are the pixel coordinates of the principal point (in pixels) ? The principal point is not exactly in the middle of the image plane so maybe this change increase the accuracy overall. Correct me if I am wrong — Reply to this email directly, view it on GitHub <#36 (comment) <#36 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEEQJBJZ2OOL2HJSPXDYYG4SLAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGY4DAMZTHA https://github.com/notifications/unsubscribe-auth/AJWHFEEQJBJZ2OOL2HJSPXDYYG4SLAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOJXGY4DAMZTHA . You are receiving this because you were mentioned.Message ID: @.> @1ssb https://github.com/1ssb Thanks for your work. I have several questions. 1. I wanna ask why P_x and P_y are set to 128, I guess it should be the actual value of the principle point? 2. And since it could handle any output size, but it seems the original_width and original_height are not used, I guess it should replace FINAL_WIDTH and FINAL_WEIGHT ? 3. I checked the edited history, it seems you also modified FX and FY, may I ask why? Thanks! — Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDX2QCBHCF3VEKQCVTYY2RDHAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBTGE4TIMBSGQ . You are receiving this because you were mentioned.Message ID: @.>

Got it thanks!

NoSuchObjectException commented 2 months ago

Hi @LiheYoung I am indeed using the metric depth and the point cloud I have uploaded is indeed from the zoedepth. Can you kindly confirm that if these values of depth are for example 4.35 metres etc, they are indeed in metres wothout any need for further analysis/transformation? On Sat, 27 Jan, 2024, 10:15 am Lihe Yang, @.> wrote: Hi @1ssb https://github.com/1ssb , our Depth Anything models primarily focus on relative depth estimation. Thus, the output value from the HuggingFace published models does not represent any metric meanings. However, if you want to obtain metric depth information (in meters), you can use our models introduced here: https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth, just like @loevlie https://github.com/loevlie mentioned. — Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDBCBTCPDR2TMZLWYDYQSA7ZAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHE4DOMJSGA . You are receiving this because you were mentioned.Message ID: @.>

Screenshot 2024-01-26 at 10 21 10 PM

Pretty good on visualisation. @LiheYoung is it possible to confirm that the depths say 4.54 is in metres and there are no additional scales at play?

@1ssb How were you able to get the depth value, i.e 4.54 meters? I am able to the generate the point cloud map; however, I am unsure how to get the depth value.

1ssb commented 2 months ago

By saving the tensor array output from the depth anything model itself.

On Tue, 9 Apr, 2024, 6:10 am Aditya, @.***> wrote:

Hi @LiheYoung https://github.com/LiheYoung I am indeed using the metric depth and the point cloud I have uploaded is indeed from the zoedepth. Can you kindly confirm that if these values of depth are for example 4.35 metres etc, they are indeed in metres wothout any need for further analysis/transformation? … <#m-6946876434900797959> On Sat, 27 Jan, 2024, 10:15 am Lihe Yang, @.> wrote: Hi @1ssb https://github.com/1ssb https://github.com/1ssb https://github.com/1ssb , our Depth Anything models primarily focus on relative depth estimation. Thus, the output value from the HuggingFace published models does not represent any metric meanings. However, if you want to obtain metric depth information (in meters), you can use our models introduced here: https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth https://github.com/LiheYoung/Depth-Anything/tree/main/metric_depth, just like @loevlie https://github.com/loevlie https://github.com/loevlie https://github.com/loevlie mentioned. — Reply to this email directly, view it on GitHub <#36 (comment) https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-1912987120>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDBCBTCPDR2TMZLWYDYQSA7ZAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHE4DOMJSGA https://github.com/notifications/unsubscribe-auth/AJWHFEDBCBTCPDR2TMZLWYDYQSA7ZAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJSHE4DOMJSGA . You are receiving this because you were mentioned.Message ID: @.>

[image: Screenshot 2024-01-26 at 10 21 10 PM] https://private-user-images.githubusercontent.com/40661648/300050173-fd4f8b9e-af42-4ac1-b113-d2ed40ef1b27.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTI2MDM0NzYsIm5iZiI6MTcxMjYwMzE3NiwicGF0aCI6Ii80MDY2MTY0OC8zMDAwNTAxNzMtZmQ0ZjhiOWUtYWY0Mi00YWMxLWIxMTMtZDJlZDQwZWYxYjI3LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MDglMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDA4VDE5MDYxNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTZhMzM2NGQzMzg5NzJlNjBlZDA2Zjg1MTIxOTQ0NDU2YmM5YmNiYjliZGMxYjgxZmRhNDdiODllYTdjMWQ1OGQmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.HqBC5uc7Dfyht3jYMAXsP9didXNZBN8qK6-IuPW6NnI

Pretty good on visualisation. @LiheYoung https://github.com/LiheYoung is it possible to confirm that the depths say 4.54 is in metres and there are no additional scales at play?

@1ssb https://github.com/1ssb How were you able to get the depth value, i.e 4.54 meters? I am able to the generate the point cloud map; however, I am unsure how to get the depth value.

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-2043557658, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEF23SJSJZWCH6IER33Y4L2UZAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBTGU2TONRVHA . You are receiving this because you were mentioned.Message ID: @.***>

AbbosAbdullayev commented 2 months ago

Hi @1ssb, could you provide detailed information as you mentioned obtaining a value of 4.54 meters by saving depth? From my understanding, the output of depth anything is a 2D matrix. Can we infer that the depth anything output tensor already provides the actual depth(in meteres) output, and zoedepth is used for point cloud generation(2D>3D)? Thanks for your time!

1ssb commented 2 months ago

My code simply transforms the 2D output to a 3D point cloud. Its just geometrically placing it along the rays.

On Wed, 24 Apr, 2024, 8:19 pm abo-galaxy, @.***> wrote:

Hi @1ssb https://github.com/1ssb, could you provide detailed information as you mentioned obtaining a value of 4.54 meters by saving depth? From my understanding, the output of depth anything is a 2D matrix. Can we infer that the depth anything output tensor already provides the actual output, and zoedepth is used for point cloud visualization? Thanks for your time!

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-2074612806, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEBP52FI66K437WSR7DY66BLXAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANZUGYYTEOBQGY . You are receiving this because you were mentioned.Message ID: @.***>

puyiwen commented 2 months ago

Hi, really sorry I do not know Mandarin, can you post a translation? On Mon, 6 May, 2024, 5:31 pm puyiwen, @.> wrote: 嗨,让我来谈谈这个问题,因为澄清这一点似乎很重要。一些东西: 1. 我的脚本仅将距离图转换为 3D 深度图。因此,如果主点深度不准确,即光轴穿过场景的位置(未应用变换),那么我的脚本没有任何帮助。您可以通过检查中点索引处的深度轻松验证这一点。 2. 通过扩展,深度不会是完美的,也就是说,考虑到图像归因于特定的上下文或模型(即室内/室外),度量表示应该是正确的。会有一些错误,因为最终仍然是对模型正在学习的度量表示的相对理解。 3. 了解深度学习模型在数据分布中学习。如果数据没有某些情况,它可能不会支持它,显然它还没有接受过全景图训练,因此它不会为您提供任何良好的深度,因此请确保您推断出模型所输入的相同数据域。 4. 最后,一个完美的公制深度估计可能会给你一些可以很好区分 10 厘米差异的东西,但这也有点碰运气。对于定义的度量深度,损失函数的目标不负责为您提供正确的厘米表示,而是为您提供正确的米表示。虽然没有什么可以阻止它学习这一点,但场景上总是存在平均聚合,这将防止过度拟合。 @. @. https://github.com/LiheYoung, 非常感谢您的回复。 可以理解的是,该网络是在全分辨率 KITTI 上进行训练的,并且有望在此特定集上提供良好的结果。然而;训练深度估计网络有什么意义?您可能会用它来拍摄从您自己的相机中获得的照片,对吧? 让我举一个例子。我在预训练的户外(KITTI)上尝试了三种不同设置的度量深度-任何东西,在 KITTI_Evaluation_set 上,包含 1000 张图像(您可以从 KITTI 官方网站下载:https://s3.eu-central-1 .amazonaws.com/avg-kitti/data_depth_selection.zip http://url): 1- 在全分辨率图像 (1216x352) 上应用网络,然后将输出与地面实况进行比较 -> RMSE=~2 米 2- 裁剪图像中心以获得 512x352,在裁剪后的图像上应用网络,并将输出与相应的进行比较裁剪后的地面实况 -> RMSE=~4 米 3- 裁剪图像中心以获得 512x288,在裁剪后的图像上应用网络,并将输出与相应裁剪的地面实况进行比较 -> RMSE=~6 米 所有测试图像都是具有相同焦距的 KITTI 内容(没有调整图像大小),只是改变了视野(通过裁剪)。也许,如果我使用不同的内容,我会得到更糟糕的结果。 问题仍然有效:是否可以将该网络用于任意图像?如果是,如何补偿不同的焦距和视场? 我自己训练了一个网络(基于Mobilenet-V2)。我知道我的训练细节。我还找到了如何适应焦距差异(在我的网络中,在将图像输入网络之前需要调整大小)和不同的视野。如果你给我一个任意图像,我知道如何从我的网络中获取正确的深度图。我想知道如何从深度 aynthing 中获得相同的结果。 您好,您对齐不同相机的数据是否是采用了大疆的Metric3D论文的做法?采用规范化相机来统一? — Reply to this email directly, view it on GitHub <#36 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEEMUYDYYF6M6PUAK2TZA4WW5AVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJVGM2TQMZVGM . You are receiving this because you were mentioned.Message ID: @.>

I am so sorry, I replied to the wrong person. I will delete it and reply again.

puyiwen commented 2 months ago

Hi, @hgolestaniii. Do you align the data of different cameras using the method in DJI's Metric3D paper? Do you use normalized cameras to unify?

sinskyy commented 3 weeks ago

Hi, @1ssb @LiheYoung. I'm so sorry to bother you. May I ask you a question about how to do prediction with this metrics depth model?, my goal is to get the closest value in meters for my iot project. Because the code in the rep is only did the evaluation. When i try to make prediction code for this model, i got lost when learning about all different configs is, and also i only, want to get prediction from 1 input image, not in batches. I am a beginner and still have a lot more to learn. I'd be so happy if anyone could help me regarding this matters.

weihongwei-zg commented 2 weeks ago

@1ssb @LiheYoung Hello, I currently do not have absolute depth data for my own scene, but I hope to use the relative depth map predicted by depth anything to find a way to obtain an absolute depth map. Is there any direct method? For example, using the least squares method, the relative depth difference between two points in a relative depth map can be calculated, and then the absolute depth difference between these two points can be calculated to determine how many meters a pixel represents in a relative depth map?

1ssb commented 2 weeks ago

Use metric depths instead and use plug and play methods to get local approximates for control/interest points.

Best Subhransu


From: Wei Hongwei @.> Sent: Friday, June 21, 2024 7:03:47 PM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] true metric depth values (Issue #36)

@1ssbhttps://github.com/1ssb @LiheYounghttps://github.com/LiheYoung Hello, I currently do not have absolute depth data for my own scene, but I hope to use the relative depth map predicted by depth anything to find a way to obtain an absolute depth map. Is there any direct method? For example, using the least squares method, the relative depth difference between two points in a relative depth map can be calculated, and then the absolute depth difference between these two points can be calculated to determine how many meters a pixel represents in a relative depth map?

— Reply to this email directly, view it on GitHubhttps://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-2182332569, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEDF2CGUOGADIN5K43DZIPT7HAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBSGMZTENJWHE. You are receiving this because you were mentioned.Message ID: @.***>

weihongwei-zg commented 2 weeks ago

Use metric depths instead and use plug and play methods to get local approximates for control/interest points. Best Subhransu

Thank you for your reply. Are there any specific methods and links

1ssb commented 2 weeks ago

No general methods, heavily dependent on what you are doing exactly.

Best Subhransu


From: Wei Hongwei @.> Sent: Friday, June 21, 2024 7:11:57 PM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] true metric depth values (Issue #36)

Use metric depths instead and use plug and play methods to get local approximates for control/interest points. Best Subhransu

Thank you for your reply. Are there any specific methods and links

— Reply to this email directly, view it on GitHubhttps://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-2182347840, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEBWWAYSSS42VYKLB3TZIPU53AVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBSGM2DOOBUGA. You are receiving this because you were mentioned.Message ID: @.***>

weihongwei-zg commented 2 weeks ago

No general methods, heavily dependent on what you are doing exactly. Best Subhransu ____ From: Wei Hongwei @.> Sent: Friday, June 21, 2024 7:11:57 PM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] true metric depth values (Issue #36) Use metric depths instead and use plug and play methods to get local approximates for control/interest points. Best Subhransu Thank you for your reply. Are there any specific methods and links — Reply to this email directly, view it on GitHub<#36 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEBWWAYSSS42VYKLB3TZIPU53AVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBSGM2DOOBUGA. You are receiving this because you were mentioned.Message ID: @.***>

Thank you. I want to input an RGB image of my own scene and obtain its absolute depth map. But I don't have the true value data of my own scene, so I consider using a general relative depth estimation large model to predict the relative depth map, and then find a way to convert it into an absolute depth map (in this process, we allow us to know the absolute depth of certain objects/points in the RGB image).

weihongwei-zg commented 2 weeks ago

I have seen some discussions in this area under Midas, and I will first try their methods. https://github.com/isl-org/MiDaS/issues/171

Nd-sole commented 3 days ago

does the output from the metric depth give us absolute depth (or disparity - as in case of relative depth estimation)? @1ssb

Also, can you tell how you convert points in 2d to 3d? is there some derivation for it which you can share?

1ssb commented 2 days ago

Hello @Nd-sole, I do not consider this space effective for clarifications for something that has been adopted. Either raise this in a Discussion or kindly mail me at Subhransu.Bhattacharjee@anu.edu.au if you are confused with the ideas.

Edric-star commented 1 day ago

@1ssb Hello, thanks for your code I generated 3d point clouds of my images, but I have some questions that I wanted to query, for example, the following code:

            resized_color_image = color_image.resize((FINAL_WIDTH, FINAL_HEIGHT), Image.LANCZOS)
            resized_pred = Image.fromarray(pred).resize((FINAL_WIDTH, FINAL_HEIGHT), Image.NEAREST)

If I input images 1920*1536, what are the benefits of getting different output sizes? Since I wanted to get the metric depth estimation of my input images, comparing it with my original point clouds of the scene. In this situation, I suppose I'd get the output size the same as its inputs, but the deviations are obvious(the original pc and estimated pc are under the same coordinates). Is there anything that I might miss while using your code? I modified the FINAL_WIDTH, and FINAL_HEIGHT to my paras and added CX and CY, btw I'm not sure if you checked the latest version of your code modified by the author in depth anything v2 since I used this code. I was wondering if there is anything that I need to pay special attention while using the code to realize my purpose to generate the exact estimation. If anything above seems incorrect, please kindly let me know about it, thanks!

1ssb commented 1 day ago

I have repeated this a number of times. There are no connotations to this. In case you have not noticed I have not authored this work, simply contributed the script.

It was simply the set up I am using that I use for my own work. It is up to you to replace it for your own set up.

Best Subhransu


From: EdricHe @.> Sent: Friday, July 5, 2024 12:54:52 AM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] true metric depth values (Issue #36)

@1ssbhttps://github.com/1ssb Hello, thanks for your code I generated 3d point clouds of my images, but I have some questions that I wanted to query, for example, the following code:

        resized_color_image = color_image.resize((FINAL_WIDTH, FINAL_HEIGHT), Image.LANCZOS)
        resized_pred = Image.fromarray(pred).resize((FINAL_WIDTH, FINAL_HEIGHT), Image.NEAREST)

If I input images 1920*1536, what are the benefits of getting different output sizes? Since I want to get the metric depth estimation of my input images, and I want to compare it with my original point clouds of the scene. In this situation, I suppose I'd get the output size the same as its inputs, but the deviations are obvious(the original pc and estimated pc are under the same coordinates). Is there anything that I might miss while using your code? I modified the FINAL_WIDTH, and FINAL_HEIGHT to my paras and added CX and CY, btw I'm not sure if you checked the latest version of your code modified by the author in depth anything v2https://github.com/DepthAnything/Depth-Anything-V2/blob/main/metric_depth/depth_to_pointcloud.py since I used this code. I was wondering if there is anything that I need to pay special attention while using the code to realize my purpose to generate the exact estimation. If anything above seems incorrect, please kindly let me know about it, thanks!

— Reply to this email directly, view it on GitHubhttps://github.com/LiheYoung/Depth-Anything/issues/36#issuecomment-2209176484, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEENZ6IHG3EWFTHVHYTZKVO3ZAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGE3TMNBYGQ. You are receiving this because you were mentioned.Message ID: @.***>

Edric-star commented 18 hours ago

I have repeated this a number of times. There are no connotations to this. In case you have not noticed I have not authored this work, simply contributed the script. It was simply the set up I am using that I use for my own work. It is up to you to replace it for your own set up. Best Subhransu ____ From: EdricHe @.> Sent: Friday, July 5, 2024 12:54:52 AM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] true metric depth values (Issue #36) @1ssbhttps://github.com/1ssb Hello, thanks for your code I generated 3d point clouds of my images, but I have some questions that I wanted to query, for example, the following code: resized_color_image = color_image.resize((FINAL_WIDTH, FINAL_HEIGHT), Image.LANCZOS) resized_pred = Image.fromarray(pred).resize((FINAL_WIDTH, FINAL_HEIGHT), Image.NEAREST) If I input images 1920*1536, what are the benefits of getting different output sizes? Since I want to get the metric depth estimation of my input images, and I want to compare it with my original point clouds of the scene. In this situation, I suppose I'd get the output size the same as its inputs, but the deviations are obvious(the original pc and estimated pc are under the same coordinates). Is there anything that I might miss while using your code? I modified the FINAL_WIDTH, and FINAL_HEIGHT to my paras and added CX and CY, btw I'm not sure if you checked the latest version of your code modified by the author in depth anything v2https://github.com/DepthAnything/Depth-Anything-V2/blob/main/metric_depth/depth_to_pointcloud.py since I used this code. I was wondering if there is anything that I need to pay special attention while using the code to realize my purpose to generate the exact estimation. If anything above seems incorrect, please kindly let me know about it, thanks! — Reply to this email directly, view it on GitHub<#36 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEENZ6IHG3EWFTHVHYTZKVO3ZAVCNFSM6AAAAABCK3CZBOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMBZGE3TMNBYGQ. You are receiving this because you were mentioned.Message ID: @.***>

Thanks for your reply, really appreciate it