Open rick-dn opened 5 months ago
I think if you can setup your dataset similar to Hypersim based on these docs https://github.com/DepthAnything/Depth-Anything-V2/tree/main/metric_depth and this code https://github.com/DepthAnything/Depth-Anything-V2/blob/main/metric_depth/train.py you may be able to finetune. I dont know the exact the details of the HyperSim format
Hi does anyone successfully train from metric checkpoint and loaded the fine tuned model ?
@rick-dn did it work ?
Hi! Yes and No. It did work in the sense that I trained it on my own dataset and got some results, but the results are not as good as expected, but that's my research work, maybe it will work for other datasets.
I just needed to modify the dataset/hypersim.py for my dataset and change the train.py accordingly. Also, the model does not load for inference after training. It says
_Missing key(s) in state_dict: "pretrained.cls_token"...... Unexpected key(s) in state_dict: "model", "optimizer", "epoch", "previous_best".___
I had to write the following:
my_state_dict = {} for key in state_dict['model'].keys(): my_state_dict[key.replace('module.', '')] = state_dict['model'][key]
Let me know if this is the correct approach or am I missing something
Same problem here. I managed to train it on my own dataset by replacing the filename paths of the vkitti2 in the /splits folder
The inference did not work and am also having the Missing key problem.
Where exactly did you write these lines ?
my_state_dict = {} for key in state_dict['model'].keys(): my_state_dict[key.replace('module.', '')] = state_dict['model'][key]
and what changes did you do for the train.py ?
Hi @rick-dn I think you are correct, i did the same too by replacing the keys in the map
train.py changes were to just load my dataloader instead of hyperism data loader. For inference if you follow the instructions on the main page, the Use our models bit, you need to load the more like:
model.load_state_dict(torch.load(f'checkpoints/depth_anything_v2_{encoder}.pth', map_location='cpu'))
so just before this line I had to write the missing key workaround.
Cheers Rick
@rick-dn thank you so much !
The results are not good enough though, I'll update you once I find a way to improve the depth map.
Hi, I confronted with the same problems while performing the inference with my own trained checkpoints, it showed the missing keys error as you guys. Sorry to bother you again, but do you know how to fix this problem or what did I miss while fine-tuning? @LiheYoung
My fine-tuned results are horrible as well, I'm not sure about 3 aspects:
hi i also meet the same error
and my infer code according to run.py `**model_configs = { 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}, 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]}, 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]} }
encoder = 'vitl' # or 'vits', 'vitb' dataset = 'vkitti' # 'hypersim' for indoor model, 'vkitti' for outdoor model max_depth = 80 # 20 for indoor model, 80 for outdoor model finetune_custom_model_path =osp.join(f'metric_depth/exp/custom-'+f'{encoder}', 'latest.pth') model = DepthAnythingV2({model_configs[encoder], 'max_depth': max_depth}) model.load_state_dict(torch.load(f'{finetune_custom_model_path}', map_location='cpu'))**`
how to correct my code? thank u
May I know how you guys prepared your own custom dataset? What format I need to follow? Is there a guideline?
hi i also meet the same error
and my infer code according to run.py `**model_configs = { 'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}, 'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]}, 'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]} }
encoder = 'vitl' # or 'vits', 'vitb' dataset = 'vkitti' # 'hypersim' for indoor model, 'vkitti' for outdoor model max_depth = 80 # 20 for indoor model, 80 for outdoor model finetune_custom_model_path =osp.join(f'metric_depth/exp/custom-'+f'{encoder}', 'latest.pth') model = DepthAnythingV2({model_configs[encoder], 'max_depth': max_depth}) model.load_state_dict(torch.load(f'{finetune_custom_model_path}', map_location='cpu'))**`
how to correct my code? thank u
Please see my comments earlier
depth_anything = DepthAnythingV2(**{**model_configs[encoder], 'max_depth': max_depth})
old_dict = torch.load(model_path, map_location='cpu')
if "model" in old_dict:
old_dict = old_dict["model"]
new_dict = {key.replace('module.', ''): value for key, value in old_dict.items()}
depth_anything.load_state_dict(new_dict)
depth_anything = depth_anything.to(device).eval()
Could use this code too, please adjust the code to your variable
Is there any instruction on how to fine tune it for my dataset?