Detection models do not match the format required by the Detectron network

I tried to train this model on MSRVTT, but I don`t have a detection.json file, so I want to obtain it by Detectron_pytorch @ 75793bf with the given detection models. While this network can not read file correctly as following:

I tried to print the content of the file，in the following text, the upper is part of the content of the file given by Detectron_pytorch and the lower is given by this model.

'res4_20_branch2b_bn_s', 'res3_3_branch2b_w_momentum', 'res4_0_branch2a_w', 'res2_1_branch2b_bn_s', 'res4_20_branch2b_bn_b', 'res4_13_branch2a_w_momentum', 'res4_12_branch2c_bn_s_momentum', 'res4_12_branch2c_bn_b', 'res5_2_branch2c_w', 'res3_2_branch2a_bn_b_momentum', 'res5_1_branch2b_bn_b_momentum', 'res4_7_branch2a_bn_s', 'res4_12_branch2c_bn_s', 'res4_4_branch2c_bn_b_momentum', 'res3_1_branch2a_w', 'res4_6_branch2b_bn_s_momentum', 'res4_2_branch2c_bn_s_momentum', 'res4_7_branch2b_bn_s_momentum', 'res2_1_branch2b_w', 'res3_1_branch2c_w', 'res4_20_branch2a_w_momentum', 'res4_17_branch2b_w', 'pred_b', 'res3_3_branch2b_w', 'res4_8_branch2a_bn_s', 'pred_w', 'res4_19_branch2a_bn_b', 'res4_3_branch2a_w', 'res4_3_branch2b_bn_b', 'res4_1_branch2b_w', 'res4_12_branch2a_bn_b_momentum', 'res5_1_branch2b_w_momentum', 'res4_3_branch2c_w', 'res4_5_branch2a_bn_s', 'res2_1_branch2c_bn_s_momentum', 'res2_0_branch2c_bn_s_momentum', 'res4_5_branch2a_bn_b', 'res2_0_branch1_bn_s_momentum']) odict_keys(['Conv_Body.conv_top.weight', 'Conv_Body.conv_top.bias', 'Conv_Body.topdown_lateral_modules.0.conv_lateral.weight', 'Conv_Body.topdown_lateral_modules.0.conv_lateral.bias', 'Conv_Body.topdown_lateral_modules.1.conv_lateral.weight', 'Conv_Body.topdown_lateral_modules.1.conv_lateral.bias', 'Conv_Body.topdown_lateral_modules.2.conv_lateral.weight', 'Conv_Body.topdown_lateral_modules.2.conv_lateral.bias', 'Conv_Body.posthoc_modules.0.weight', 'Conv_Body.posthoc_modules.0.bias', 'Conv_Body.posthoc_modules.1.weight', 'Conv_Body.posthoc_modules.1.bias', 'Conv_Body.posthoc_modules.2.weight', 'Conv_Body.posthoc_modules.2.bias', 'Conv_Body.posthoc_modules.3.weight', 'Conv_Body.posthoc_modules.3.bias', 'Conv_Body.conv_body.res1.conv1.weight', 'Conv_Body.conv_body.res1.bn1.weight'

They have different structures. How to deal with it? Or is there a better detection model?

NVIDIA / ContrastiveLosses4VRD

Detection models do not match the format required by the Detectron network #29