DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
https://depth-anything-v2.github.io
Apache License 2.0
3.86k stars 336 forks source link

Fintuning #8

Open rick-dn opened 5 months ago

rick-dn commented 5 months ago

Is there any instruction on how to fine tune it for my dataset?

hinsonan commented 4 months ago

I think if you can setup your dataset similar to Hypersim based on these docs https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth and this code https://github.com/DepthAnything/Depth-Anything-V2/blob/main/metric_depth/train.py you may be able to finetune. I dont know the exact the details of the HyperSim format

jack-Singapore commented 4 months ago

Hi does anyone successfully train from metric checkpoint and loaded the fine tuned model ?

YacineDeghaies commented 4 months ago

@rick-dn did it work ?

rick-dn commented 4 months ago

Hi! Yes and No. It did work in the sense that I trained it on my own dataset and got some results, but the results are not as good as expected, but that's my research work, maybe it will work for other datasets.

I just needed to modify the dataset/hypersim.py for my dataset and change the train.py accordingly. Also, the model does not load for inference after training. It says

_Missing key(s) in state_dict: "pretrained.cls_token"...... Unexpected key(s) in state_dict: "model", "optimizer", "epoch", "previous_best".___

I had to write the following:

my_state_dict = {} for key in state_dict['model'].keys(): my_state_dict[key.replace('module.', '')] = state_dict['model'][key]

Let me know if this is the correct approach or am I missing something

YacineDeghaies commented 4 months ago

Same problem here. I managed to train it on my own dataset by replacing the filename paths of the vkitti2 in the /splits folder

The inference did not work and am also having the Missing key problem. Screenshot 2024-07-08 at 12 29 12

Where exactly did you write these lines ?

my_state_dict = {} for key in state_dict['model'].keys(): my_state_dict[key.replace('module.', '')] = state_dict['model'][key]

and what changes did you do for the train.py ?

jack-Singapore commented 4 months ago

Hi @rick-dn I think you are correct, i did the same too by replacing the keys in the map

rick-dn commented 4 months ago

train.py changes were to just load my dataloader instead of hyperism data loader. For inference if you follow the instructions on the main page, the Use our models bit, you need to load the more like:

model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu'))

so just before this line I had to write the missing key workaround.

Cheers Rick

YacineDeghaies commented 4 months ago

@rick-dn thank you so much !

The results are not good enough though, I'll update you once I find a way to improve the depth map.

Screenshot 2024-07-08 at 12 46 07

Edric-star commented 4 months ago

Hi, I confronted with the same problems while performing the inference with my own trained checkpoints, it showed the missing keys error as you guys. Sorry to bother you again, but do you know how to fix this problem or what did I miss while fine-tuning? @LiheYoung

Edric-star commented 4 months ago

My fine-tuned results are horrible as well, I'm not sure about 3 aspects:

  1. where should I pay special attention besides ignoring the depth gt where there is no value(I used sparse gt)
  2. The images were resized to 518*518 for training, since the resolution of my own datasets are different, should I modify its resize parameters?
  3. A lot of people in this issues seemed to have confronted with the inferring mistakes of the missing keys, even though I got the codes running by adding the codes which @rick-dn offered(great thanks!), still not sure about some probable misktakes...
nuanxinqing commented 2 months ago

hi i also meet the same error 20240830-183530

and my infer code according to run.py `**model_configs = { 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}, 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]}, 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]} }

encoder = 'vitl' # or 'vits', 'vitb' dataset = 'vkitti' # 'hypersim' for indoor model, 'vkitti' for outdoor model max_depth = 80 # 20 for indoor model, 80 for outdoor model finetune_custom_model_path =osp.join(f'metric_depth/exp/custom-'+f'{encoder}', 'latest.pth') model = DepthAnythingV2({model_configs[encoder], 'max_depth': max_depth}) model.load_state_dict(torch.load(f'{finetune_custom_model_path}', map_location='cpu'))**`

how to correct my code? thank u

SuteraBlu commented 1 month ago

May I know how you guys prepared your own custom dataset? What format I need to follow? Is there a guideline?

rick-dn commented 1 month ago

hi i also meet the same error 20240830-183530

and my infer code according to run.py `**model_configs = { 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}, 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]}, 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]} }

encoder = 'vitl' # or 'vits', 'vitb' dataset = 'vkitti' # 'hypersim' for indoor model, 'vkitti' for outdoor model max_depth = 80 # 20 for indoor model, 80 for outdoor model finetune_custom_model_path =osp.join(f'metric_depth/exp/custom-'+f'{encoder}', 'latest.pth') model = DepthAnythingV2({model_configs[encoder], 'max_depth': max_depth}) model.load_state_dict(torch.load(f'{finetune_custom_model_path}', map_location='cpu'))**`

how to correct my code? thank u

Please see my comments earlier

jack-Singapore commented 1 month ago
depth_anything = DepthAnythingV2(**{**model_configs[encoder], 'max_depth': max_depth})
old_dict = torch.load(model_path, map_location='cpu')
if "model" in old_dict:
    old_dict = old_dict["model"]
new_dict = {key.replace('module.', ''): value for key, value in old_dict.items()}
depth_anything.load_state_dict(new_dict)
depth_anything = depth_anything.to(device).eval()

Could use this code too, please adjust the code to your variable