LiheYoung / Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
https://depth-anything.github.io
Apache License 2.0
6.73k stars 516 forks source link

Does anybody achieve the metric depth estimation on a custom dataset successfully? #68

Open MichaelWangGo opened 7 months ago

MichaelWangGo commented 7 months ago

Hi,

This post is just to discuss how to achieve metric depth estimation on a custom dataset, like I am using SCARED dataset. If anyone successfully fine-tune the model and achieve metric depth estimation, could you tell me which code did you modify?

1ssb commented 7 months ago

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

Denny-kef commented 7 months ago

@1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well! image

1ssb commented 7 months ago

Sorry @Denny-kef, but I cannot help you with that. Make sure you are using the outdoor model and not the indoor one.

Skies are always difficult to correctly capture on an absolute scale so I do not think your expectations can be too high for that. Relative scaling of distant backgrounds are also always better than absolute ones in the history of monocular depth estimation.

On Tue, 6 Feb, 2024, 8:48 am Dennis Loevlie, @.***> wrote:

@1ssb https://github.com/1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well! image.png (view on web) https://github.com/LiheYoung/Depth-Anything/assets/121886500/599554a8-ebb2-462c-843d-69c627a9d04a

— Reply to this email directly, view it on GitHub https://github.com/LiheYoung/Depth-Anything/issues/68#issuecomment-1928154256, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJWHFEDZIZMLRDJWSUSRMI3YSFHSVAVCNFSM6AAAAABCVJF6ACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMRYGE2TIMRVGY . You are receiving this because you were mentioned.Message ID: @.***>

Denny-kef commented 7 months ago

Hi @1ssb thanks for getting back to me! I am using the outdoor checkpoints and just wondering if you (or anyone else) has seen similar results with the metric depth predictions?

Denny-kef commented 7 months ago

That's interesting about the metric vs relative thing. I mean I could mask out the background using the relative depth network or a lightweight segmentation network but it seems like there should be a better way..

1ssb commented 7 months ago

Oh this is interesting @Denny-kef I took a look at your image and it seems to me it is captured by either a fish eye lens or the image itself is a bit distorted as in to the eye itself it looks like the clouds are much closer than they actually are, this is a very cool thing as well. I am not sure I have an exact answer to this but it might as well be an OOD.

pestrstr commented 7 months ago

@Denny-kef Idk if this can help you in your task, but this is my personal interpretation: authors report on the paper that they set the disparity value (inverse of depth) to 0 for all the pixels labeled as "sky" by a semantic segmentation model (see section 3.1 of the paper on arxiv). I haven't seen the implementation details of their code, but this can affect their training in different ways:

That said, I think the features from their frozen encoder are really powerful for metric depth estimation, but I guess it would very hard to use them to produce correct values on an absolute scale for the sky. In relative depth estimation, the overall qualitative goodness of the sky predictions could come from the semantic feature alignment done during training (see section 3.3)

loevlie commented 7 months ago

The best solution that I found for the "background" issue with metric depth estimation predictions is this:

  1. Retrieve the relative depth map as a secondary output from the metric depth estimation model.
  2. Using that depth map (since it is much better at predicting the background) I was able to generate a binary mask to eliminate things like the sky from my metric depth results.

Importantly, these operations don't add any significant time to the inference.

1ssb commented 7 months ago

Bottomline: Don't try to predict skies or reflections.

loevlie commented 7 months ago

I was not trying to predict skies but I was trying to remove them from the outputted depth map so they don't show up in the point cloud. But yes do not try to predict the depth of the sky or reflections!

LiheYoung commented 7 months ago

Hi @loevlie, if you are trying to detect the sky and remove it, you can try our relative depth models. The output value 0 from these models can be considered as the sky (or extremely far). Alternatively, you can use a pre-trained semantic segmentation model to detect the sky.

loevlie commented 6 months ago

Hi @LiheYoung, yes that works very well! Thank you!

xiaobh1519 commented 3 months ago

@1ssb Have you seen results like this with the metric depth outdoor checkpoints? With all of my experimentation it seems like the sky predictions are not good. Although the relative depth predicts the sky and other "background" extremely well! image

Could you kindly share with me the parameters you adjusted during fine-tuning? I've been encountering poor performance in my experiments with another dataset, and I've been struggling to resolve the issue. The details of the problem are as follows.https://github.com/LiheYoung/Depth-Anything/issues/172#issue-2292062398

andrewhbradley9 commented 3 months ago

@Denny-kef Hi Denny, would you be able to explain how you got the metric depth working and your output depth images? I've been trying to run the metric outdoor model on my custom dataset but have been running into a lot of issues. Any help would be greatly appreciated!

shilpaullas97 commented 2 months ago

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

Hi @1ssb ,

Are you able to train metric depth estimation on a dataset without depth maps (labels) ? Could you please share more details about your training trial?

1ssb commented 2 months ago

Hi, I never train, but test time fine tune using a plug and play method on the RealEstate10k. Kindly remember that Realestate10k is not RGBD dataset, but you can use the triangulation method and use the control points to rescale the predictions directly. It's not very neat but it's the best you can do without retraining from scratch.

Best Subhransu


From: Shilpa Ullas @.> Sent: Thursday, June 27, 2024 6:04:17 PM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] Does anybody achieve the metric depth estimation on a custom dataset successfully? (Issue #68)

I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good.

Hi @1ssbhttps://github.com/1ssb ,

Are you able to train metric depth estimation on a dataset without depth maps (labels) ? Could you please share more details about your training trial?

— Reply to this email directly, view it on GitHubhttps://github.com/LiheYoung/Depth-Anything/issues/68#issuecomment-2194051505, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEH236HROYVD4WEG3DTZJPBQDAVCNFSM6AAAAABCVJF6ACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUGA2TCNJQGU. You are receiving this because you were mentioned.Message ID: @.***>

callmeray commented 1 month ago

Hi, I never train, but test time fine tune using a plug and play method on the RealEstate10k. Kindly remember that Realestate10k is not RGBD dataset, but you can use the triangulation method and use the control points to rescale the predictions directly. It's not very neat but it's the best you can do without retraining from scratch. Best Subhransu ____ From: Shilpa Ullas @.> Sent: Thursday, June 27, 2024 6:04:17 PM To: LiheYoung/Depth-Anything @.> Cc: Subhransu Bhattacharjee @.>; Mention @.> Subject: Re: [LiheYoung/Depth-Anything] Does anybody achieve the metric depth estimation on a custom dataset successfully? (Issue #68) I did it on the RealEstate10k unfortunately Depth does not exist on the dataset as such so evaluation is not possible but in general if you take an anecdotal look, it's pretty good. Hi @1ssbhttps://github.com/1ssb , Are you able to train metric depth estimation on a dataset without depth maps (labels) ? Could you please share more details about your training trial? — Reply to this email directly, view it on GitHub<#68 (comment)>, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWHFEH236HROYVD4WEG3DTZJPBQDAVCNFSM6AAAAABCVJF6ACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJUGA2TCNJQGU. You are receiving this because you were mentioned.Message ID: @.***>

Hi @1ssb ,

Could you please share more details about how you finetune at test time? Many thanks!