DepthAnything / Depth-Anything-V2

Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
https://depth-anything-v2.github.io
Apache License 2.0
3.2k stars 253 forks source link

Fine-tuning problems #99

Open Edric-star opened 1 month ago

Edric-star commented 1 month ago

Dear author, @LiheYoung , hello. The metric depth fine-tune really have baffled me these days: I used my own datasets(sparse labels), and try to lower the lr of the pretrained model vitb or vitl from 0.000005 to 0.0000005 or 0.00000001, but the evaluation results still seemed not good, for example : image And the loss was usually around 0.2-0.3, which was not correct as well, I thought I had ignored the depth, I followed kitti2.py and set valid_mask as depth>0 to ignore the void labels. What could I do to improve my fine-tuning, any strategies that you might be able to offer? Thank you very much because it's really important to me.

Edric-star commented 1 month ago

Sorry, it worked now I found my problems

Bearick commented 1 month ago

Can I ask what are the possible problems? Cause I face the same issue these days... The loss was just around 0.2-0.3.

Edric-star commented 1 month ago

Can I ask what are the possible problems? Cause I face the same issue these days... The loss was just around 0.2-0.3.

It's my own dataloader's problem and I didn't think it's general, may I ask how did you set the parameters? like your lr, did you try to lower it? btw, how many images with gt did you have as datasets? @Bearick I'm willing to share more details with you about fine-tuning problems as well as all the others involved in this issue, guys if you have any questions in here plz let me know it.

wwwyilei commented 1 month ago

Can you please share what changes you made to achieve fine-tuned training using sparse depth maps from your own dataset? I'm currently getting poor results from training

Edric-star commented 1 month ago

Can you please share what changes you made to achieve fine-tuned training using sparse depth maps from your own dataset? I'm currently getting poor results from training

Hi, what were the results like? I simply set the valid_mask where gt > 0 and <= 80 since my datasets are outdoors.

Bearick commented 1 month ago

Can I ask what are the possible problems? Cause I face the same issue these days... The loss was just around 0.2-0.3.

It's my own dataloader's problem and I didn't think it's general, may I ask how did you set the parameters? like your lr, did you try to lower it? btw, how many images with gt did you have as datasets? @Bearick I'm willing to share more details with you about fine-tuning problems as well as all the others involved in this issue, guys if you have any questions in here plz let me know it. @Edric-star Thanks for your reply. I'm using NYUv2 (~47K) for metric finetuning, and I tried 5e-7/5e-8 for lr. It's an indoor dataset so I set max_depth to 10. The final loss could not go down (~0.2). The evaluation result is not comparable at all: 93305499-e6ff-4a16-a8f9-235699c4364d and the predicted depth images are very vague. It seems that the details are lost. eg: image

Edric-star commented 1 month ago

Can I ask what are the possible problems? Cause I face the same issue these days... The loss was just around 0.2-0.3.

It's my own dataloader's problem and I didn't think it's general, may I ask how did you set the parameters? like your lr, did you try to lower it? btw, how many images with gt did you have as datasets? @Bearick I'm willing to share more details with you about fine-tuning problems as well as all the others involved in this issue, guys if you have any questions in here plz let me know it. @Edric-star Thanks for your reply. I'm using NYUv2 (~47K) for metric finetuning, and I tried 5e-7/5e-8 for lr. It's an indoor dataset so I set max_depth to 10. The final loss could not go down (~0.2). The evaluation result is not comparable at all: 93305499-e6ff-4a16-a8f9-235699c4364d and the predicted depth images are very vague. It seems that the details are lost. eg: image

Your evaluation results were way better than mine, btw what's the proportion of your trainsets and valsets? And how many epoches did you train as to get the above results? I guess you need to focus on two aspects:

  1. In your dataloader, did you set the valid mask as depth <= 10? Or you only set this parameter in dist_train.sh.
  2. Try to set the max_depth larger by according to NYUv2's depth range, if there are many values in its depth map where gt >10 ,I think you'd better modify the maxdepth.
  3. Which vit model did you use? s/b/l
Bearick commented 1 month ago

Can I ask what are the possible problems? Cause I face the same issue these days... The loss was just around 0.2-0.3.

It's my own dataloader's problem and I didn't think it's general, may I ask how did you set the parameters? like your lr, did you try to lower it? btw, how many images with gt did you have as datasets? @Bearick I'm willing to share more details with you about fine-tuning problems as well as all the others involved in this issue, guys if you have any questions in here plz let me know it. @Edric-star Thanks for your reply. I'm using NYUv2 (~47K) for metric finetuning, and I tried 5e-7/5e-8 for lr. It's an indoor dataset so I set max_depth to 10. The final loss could not go down (~0.2). The evaluation result is not comparable at all: 93305499-e6ff-4a16-a8f9-235699c4364d and the predicted depth images are very vague. It seems that the details are lost. eg: image

Your evaluation results were way better than mine, btw what's the proportion of your trainsets and valsets? And how many epoches did you train as to get the above results? I guess you need to focus on two aspects:

  1. In your dataloader, did you set the valid mask as depth <= 10? Or you only set this parameter in dist_train.sh.
  2. Try to set the max_depth larger by according to NYUv2's depth range, if there are many values in its depth map where gt >10 ,I think you'd better modify the maxdepth.
  3. Which vit model did you use? s/b/l

Thanks again.

  1. The valsets:trainsets is 460:47K. and I trained 120 epochs.
  2. I only set the max depth parameter in dist_train.sh. Will it affect the training? Cause I thought when calculating loss in train.py, the max and min depth are both used. I will check on the dataloader to see if it affects training. And, I also tried 20 as max depth, evals were worse. I thought maybe it has something to do with the sigmoid layer? I'm not sure.
  3. I use vitl for encoder. BTW, I am also trying V1. Hope it will work out.
Doctor-James commented 1 month ago

@Edric-star Have you ever visualized your depth map? I also fine-tuned with sparse depth labels, and the final numerical results were quite good. However, after visualizing it as a depth map, I found it also became sparse. Is this situation normal? QQ截图20240726194218

YacineDeghaies commented 1 month ago

@Bearick where can I get the training code for their V1 model ?

Edric-star commented 1 month ago

@Edric-star Have you ever visualized your depth map? I also fine-tuned with sparse depth labels, and the final numerical results were quite good. However, after visualizing it as a depth map, I found it also became sparse. Is this situation normal? QQ截图20240726194218

Hi, I think it's pretty common. For me I could get a relatively good performance on eval results but in my depth map it seemed a little bit weird I guess it had something to do with the sparse labels that it was hard for the model to learn the feature of each pixel. I guess it's more wise to use a different loss function while your datasets are sparse, I'm considering a classification head and loss fuc, perhaps it will work better for sparse depth maps. Another method could be turning the sparse depth map into non-sparse by employing some machine learning method, not sure about the effectiveness though :).

Bearick commented 1 month ago

@Bearick where can I get the training code for their V1 model ?

@YacineDeghaies try https://github.com/LiheYoung/Depth-Anything

Edric-star commented 1 month ago

@Bearick Hi, sorry about the delayed response. Actually I was handling the outdoor scenes and the results were relatively worse than the indoor scenes. May I ask that what's the eval results like in your first 20 epoches? For d1, d2, d3. Are they close to the final 120th epoch' results?

Bearick commented 1 month ago

@Bearick Hi, sorry about the delayed response. Actually I was handling the outdoor scenes and the results were relatively worse than the indoor scenes. May I ask that what's the eval results like in your first 20 epoches? For d1, d2, d3. Are they close to the final 120th epoch' results?

yep, d1, d2 and d3 are close to the final values. Other metrics like rmse and abs_rel continue to go up until like 50 epochs.

HaosenZ commented 1 month ago

Hello, thank you for your work. I am currently encountering some issues. When I fine tune the model using my own outdoor dataset, the loss during training is also converging. However, after training, when I test the images in the dataset, I find that the output depth values are all 0 and the depth map is completely black. Have you encountered such a problem before? If you have any suggestions, please feel free to reply to me. I would greatly appreciate it.

Edric-star commented 1 month ago

Hello, thank you for your work. I am currently encountering some issues. When I fine tune the model using my own outdoor dataset, the loss during training is also converging. However, after training, when I test the images in the dataset, I find that the output depth values are all 0 and the depth map is completely black. Have you encountered such a problem before? If you have any suggestions, please feel free to reply to me. I would greatly appreciate it.

I guess you'd check your datasets especially the ground truth part and the dataloader, e.g. you might make mistakes in generating the depth maps or you didn't load the ground truth properly

stars79689 commented 2 weeks ago

classification head and loss fuc

Have you tried this? How is the result?

Alga53 commented 3 days ago

Hello, thank you for your work. I am currently encountering some issues. When I fine tune the model using my own outdoor dataset, the loss during training is also converging. However, after training, when I test the images in the dataset, I find that the output depth values are all 0 and the depth map is completely black. Have you encountered such a problem before? If you have any suggestions, please feel free to reply to me. I would greatly appreciate it.

I have faced the same problem. May I know whether you have fixed this problem?