YvanYin / Metric3D

The repo for "Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image" and "Metric3Dv2: A Versatile Monocular Geometric Foundation Model..."
https://jugghm.github.io/Metric3Dv2/
Creative Commons Zero v1.0 Universal
1.01k stars 70 forks source link

Some problems during training #110

Open Xbinzhao opened 3 weeks ago

Xbinzhao commented 3 weeks ago

Thank you for your outstanding work, I have encountered the following two problems during training, I would like to consult:

1、The following error occurred when I used the downloaded pre-training model to fine-tune, may I ask whether there is a problem with the downloaded model or what is the problem? The model I downloaded is' metric_depth_vit_large_800k.pth ' Missing key(s) in state_dict: "mask_token", "blocks.0.0.norm1.weight", "blocks.0.0.norm1.bias", "blocks.0.0.attn.qkv.weight", "blocks.0.0.attn.qkv.bias", "blocks.0.0.attn.proj.weight", "blocks.0.0.attn.proj.bias", "blocks.0.0.ls1.gamma", "blocks.0.0.norm2.weight", "blocks.0.0.norm2.bias", "blocks.0.0.mlp.fc1.weight", "blocks.0.0.mlp.fc1.bias", "blocks.0.0.mlp.fc2.weight", "blocks.0.0.mlp.fc2.bias", "blocks.0.0.ls2.gamma", "blocks.0.1.norm1.weight", "blocks.0.1.norm1.bias", "blocks.0.1.attn.qkv.weight", "blocks.0.1.attn.qkv.bias", "blocks.0.1.attn.proj.weight", Unexpected key(s) in state_dict: "token2feature.read_3.readoper.project_patch.weight", "token2feature.read_3.readoper.project_patch.bias", "token2feature.read_3.readoper.project_learn.weight", "token2feature.read_2.readoper.project_patch.weight", "token2feature.read_2.readoper.project_patch.bias", "token2feature.read_2.readoper.project_learn.weight", "token2feature.read_1.readoper.project_patch.weight",

I also preprocessed the input model:

        for key in list(state_dict.keys()):
            if 'depth_model.encoder.' in key:
                state_dict[key.replace('depth_model.encoder.', '')] = state_dict[key]
                del state_dict[key]
            elif  'depth_model.decoder.' in key:
                state_dict[key.replace('depth_model.decoder.', '')] = state_dict[key]
                del state_dict[key]
            elif 'blocks.0model.encoder.' in key:
                state_dict[key.replace('blocks.0model.encoder.', '')] = state_dict[key]
                del state_dict[key]

2、I would like to ask can I fine-tune the model using only one data set of DDAD? In the future, I would like to use laser point cloud as depth truth value to fine-tune the model is feasible, can the json file refer to the following format? :

dict( 'files':list( dict( 'rgb': 'data/kitti_demo/rgb/xxx.png', 'depth': 'data/kitti_demo/depth/xxx.png', 'depth_scale': 1000.0 # the depth scale of gt depth img. 'cam_in': [fx, fy, cx, cy], ),

Xbinzhao commented 3 weeks ago

I have been displaying 'CUDA out of memory' during training, and I have used 45G so far, is there any problem with this?

JUGGHM commented 3 weeks ago

I have been displaying 'CUDA out of memory' during training, and I have used 45G so far, is there any problem with this?

Decreasing your batch size in the config file (e.g. down to 1 per gpu) shall work.

JUGGHM commented 3 weeks ago

Thank you for your outstanding work, I have encountered the following two problems during training, I would like to consult:

1、The following error occurred when I used the downloaded pre-training model to fine-tune, may I ask whether there is a problem with the downloaded model or what is the problem? The model I downloaded is' metric_depth_vit_large_800k.pth ' Missing key(s) in state_dict: "mask_token", "blocks.0.0.norm1.weight", "blocks.0.0.norm1.bias", "blocks.0.0.attn.qkv.weight", "blocks.0.0.attn.qkv.bias", "blocks.0.0.attn.proj.weight", "blocks.0.0.attn.proj.bias", "blocks.0.0.ls1.gamma", "blocks.0.0.norm2.weight", "blocks.0.0.norm2.bias", "blocks.0.0.mlp.fc1.weight", "blocks.0.0.mlp.fc1.bias", "blocks.0.0.mlp.fc2.weight", "blocks.0.0.mlp.fc2.bias", "blocks.0.0.ls2.gamma", "blocks.0.1.norm1.weight", "blocks.0.1.norm1.bias", "blocks.0.1.attn.qkv.weight", "blocks.0.1.attn.qkv.bias", "blocks.0.1.attn.proj.weight", Unexpected key(s) in state_dict: "token2feature.read_3.readoper.project_patch.weight", "token2feature.read_3.readoper.project_patch.bias", "token2feature.read_3.readoper.project_learn.weight", "token2feature.read_2.readoper.project_patch.weight", "token2feature.read_2.readoper.project_patch.bias", "token2feature.read_2.readoper.project_learn.weight", "token2feature.read_1.readoper.project_patch.weight",

I also preprocessed the input model:

        for key in list(state_dict.keys()):
            if 'depth_model.encoder.' in key:
                state_dict[key.replace('depth_model.encoder.', '')] = state_dict[key]
                del state_dict[key]
            elif  'depth_model.decoder.' in key:
                state_dict[key.replace('depth_model.decoder.', '')] = state_dict[key]
                del state_dict[key]
            elif 'blocks.0model.encoder.' in key:
                state_dict[key.replace('blocks.0model.encoder.', '')] = state_dict[key]
                del state_dict[key]

2、I would like to ask can I fine-tune the model using only one data set of DDAD? In the future, I would like to use laser point cloud as depth truth value to fine-tune the model is feasible, can the json file refer to the following format? :

dict( 'files':list( dict( 'rgb': 'data/kitti_demo/rgb/xxx.png', 'depth': 'data/kitti_demo/depth/xxx.png', 'depth_scale': 1000.0 # the depth scale of gt depth img. 'cam_in': [fx, fy, cx, cy], ),

(1) You need to check whether the keys of your model and the loaded checkpoints match well. I remembered having matched them when initializing the model. (2) Of course you can but you need to config the correct depth_scale, fx, and fy.