facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.23k stars 5.45k forks source link

Using Pretrained Model From My Own Dataset #861

Closed FishWoWater closed 5 years ago

FishWoWater commented 5 years ago

When I was using weights of the model pretrained from my own dataset(using pytorch0.4.0), I modify the WEIGHTS line in yaml file and run the training file.

Actual results

Errors occur as belows:

WARNING cnn.py:  25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information.
WARNING memonger.py:  55: NOTE: Executing memonger to optimize gradient memory
[I memonger.cc:236] Remapping 83 using 19 shared blobs.
INFO memonger.py:  97: Memonger memory optimization took 0.018625974655151367 secs
WARNING memonger.py:  55: NOTE: Executing memonger to optimize gradient memory
[I memonger.cc:236] Remapping 83 using 19 shared blobs.
INFO memonger.py:  97: Memonger memory optimization took 0.017100811004638672 secs
INFO train.py: 194: Loading dataset: ('coco_2014_train',)
loading annotations into memory...
Done (t=0.01s)
creating index...
index created!
INFO roidb.py:  49: Appending horizontally-flipped training examples...
INFO roidb.py:  51: Loaded dataset: coco_2014_train
INFO roidb.py: 135: Filtered 2 roidb entries: 946 -> 944
INFO roidb.py:  67: Computing bounding-box regression targets...
INFO roidb.py:  69: done
INFO train.py: 198: 944 roidb entries
INFO net.py:  62: Loading weights from: /home/slashgns/detect/detectron/models/R-50.pkl
terminate called after throwing an instance of 'at::Error'
  what():  UNKNOWN_BACKENDUNKNOWN_SCALARType is not enabled. (getType at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/ATen/Context.h:36)
frame #0: at::UndefinedTensor::UndefinedTensor() + 0xb1 (0x7f65c459b0e1 in /home/slashgns/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #1: <unknown function> + 0xf8bcb6 (0x7f65c437ecb6 in /home/slashgns/anaconda3/lib/python3.6/site-packages/torch/lib/libATen.so)
frame #2: <unknown function> + 0x101da (0x7f66a9e121da in /lib64/ld-linux-x86-64.so.2)
frame #3: <unknown function> + 0x102c3 (0x7f66a9e122c3 in /lib64/ld-linux-x86-64.so.2)
frame #4: <unknown function> + 0x14d00 (0x7f66a9e16d00 in /lib64/ld-linux-x86-64.so.2)
frame #5: <unknown function> + 0x10094 (0x7f66a9e12094 in /lib64/ld-linux-x86-64.so.2)
frame #6: <unknown function> + 0x1444b (0x7f66a9e1644b in /lib64/ld-linux-x86-64.so.2)
frame #7: <unknown function> + 0x102b (0x7f66a961802b in /lib/x86_64-linux-gnu/libdl.so.2)
frame #8: <unknown function> + 0x10094 (0x7f66a9e12094 in /lib64/ld-linux-x86-64.so.2)
frame #9: <unknown function> + 0x162d (0x7f66a961862d in /lib/x86_64-linux-gnu/libdl.so.2)
frame #10: dlopen + 0x31 (0x7f66a96180c1 in /lib/x86_64-linux-gnu/libdl.so.2)
frame #11: _PyImport_FindSharedFuncptr + 0x8a (0x7f66aa21352a in python)
frame #12: _PyImport_LoadDynamicModuleWithSpec + 0x140 (0x7f66aa23e2f0 in python)
frame #13: <unknown function> + 0x217540 (0x7f66aa23e540 in python)
frame #14: PyCFunction_Call + 0x131 (0x7f66aa13b711 in python)
frame #15: _PyEval_EvalFrameDefault + 0x542d (0x7f66aa1e94ad in python)
frame #16: <unknown function> + 0x1918e4 (0x7f66aa1b88e4 in python)
frame #17: <unknown function> + 0x192771 (0x7f66aa1b9771 in python)
frame #18: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #19: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #20: <unknown function> + 0x19253b (0x7f66aa1b953b in python)
frame #21: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #22: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #23: <unknown function> + 0x19253b (0x7f66aa1b953b in python)
frame #24: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #25: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #26: <unknown function> + 0x19253b (0x7f66aa1b953b in python)
frame #27: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #28: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #29: <unknown function> + 0x19253b (0x7f66aa1b953b in python)
frame #30: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #31: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #32: _PyFunction_FastCallDict + 0x11b (0x7f66aa1b9bab in python)
frame #33: _PyObject_FastCallDict + 0x26f (0x7f66aa138b0f in python)
frame #34: _PyObject_CallMethodIdObjArgs + 0x100 (0x7f66aa17a810 in python)
frame #35: PyImport_ImportModuleLevelObject + 0x280 (0x7f66aa12fb10 in python)
frame #36: _PyEval_EvalFrameDefault + 0x2a0b (0x7f66aa1e6a8b in python)
frame #37: PyEval_EvalCodeEx + 0x329 (0x7f66aa1ba289 in python)
frame #38: PyEval_EvalCode + 0x1c (0x7f66aa1bb01c in python)
frame #39: <unknown function> + 0x1bac8b (0x7f66aa1e1c8b in python)
frame #40: PyCFunction_Call + 0x131 (0x7f66aa13b711 in python)
frame #41: _PyEval_EvalFrameDefault + 0x542d (0x7f66aa1e94ad in python)
frame #42: <unknown function> + 0x1918e4 (0x7f66aa1b88e4 in python)
frame #43: <unknown function> + 0x192771 (0x7f66aa1b9771 in python)
frame #44: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #45: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #46: <unknown function> + 0x19253b (0x7f66aa1b953b in python)
frame #47: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #48: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #49: <unknown function> + 0x19253b (0x7f66aa1b953b in python)
frame #50: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #51: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #52: <unknown function> + 0x19253b (0x7f66aa1b953b in python)
frame #53: <unknown function> + 0x198505 (0x7f66aa1bf505 in python)
frame #54: _PyEval_EvalFrameDefault + 0x30a (0x7f66aa1e438a in python)
frame #55: _PyFunction_FastCallDict + 0x11b (0x7f66aa1b9bab in python)
frame #56: _PyObject_FastCallDict + 0x26f (0x7f66aa138b0f in python)
frame #57: _PyObject_CallMethodIdObjArgs + 0x100 (0x7f66aa17a810 in python)
frame #58: PyImport_ImportModuleLevelObject + 0x280 (0x7f66aa12fb10 in python)
frame #59: <unknown function> + 0x1a2fca (0x7f66aa1c9fca in python)
frame #60: PyCFunction_Call + 0xc6 (0x7f66aa13b6a6 in python)
frame #61: _PyEval_EvalFrameDefault + 0x542d (0x7f66aa1e94ad in python)
frame #62: <unknown function> + 0x1918e4 (0x7f66aa1b88e4 in python)
frame #63: <unknown function> + 0x192771 (0x7f66aa1b9771 in python)

Aborted (core dumped)

There was nothing wrong when I was using the official pretrained model(R-50 from ImageNet), so I am wondering whether it is because the difference between the pkl files. I use pickle to load the two files and compare theire differences. My pretrained pickle file

('layer4.2.bn3.weight', tensor([ 0.6420,  0.8300,  0.7898,  ...,  0.5610,  0.7822,  0.5628])), ('layer4.2.bn3.bias', tensor([ 5.2570e-03,  4.9805e-02,  3.2956e-02,  ...,  1.9816e-02,
         1.7960e-02,  6.1621e-03])), ('layer4.2.bn3.running_mean', tensor(1.00000e-02 *
       [-0.2360, -0.8330,  0.6992,  ...,  0.8156,  1.3434, -0.0608])), ('layer4.2.bn3.running_var', tensor(1.00000e-03 *
       [ 1.3470,  2.8371,  2.1799,  ...,  1.2767,  1.8996,  1.0057])), ('fc.weight', tensor([[-2.2010e-02, -4.2233e-02, -3.3884e-02,  ..., -2.6593e-02,
         -1.5431e-02, -6.3258e-04],
        [-1.9740e-02,  2.0792e-02, -5.1603e-02,  ..., -1.8850e-02,
          1.8224e-02,  2.8486e-02],
        [-5.1574e-03,  8.1238e-03, -1.8067e-03,  ...,  8.8984e-03,
         -1.4794e-02, -2.0212e-02],
        ...,
        [-1.6298e-02,  6.7545e-03, -1.7292e-02,  ...,  1.2670e-02,
         -4.1610e-02,  1.1890e-02],
        [-3.2885e-03,  2.3598e-02, -2.9267e-02,  ..., -4.4006e-03,
          1.1596e-02,  1.8264e-02],
        [ 1.2722e-02, -7.1067e-03, -6.2539e-03,  ..., -3.1324e-02,
         -5.1294e-02,  7.1760e-03]])), ('fc.bias', tensor(1.00000e-02 *
       [ 1.6011,  1.4751, -0.9340, -0.7654, -2.0846,  1.4920, -1.4911,
        -1.5989,  0.7874,  3.4246,  1.3219,  0.6003,  3.4578,  0.5202,
        -0.0308,  1.3570, -1.8770,  1.7360,  1.3657,  0.1801]))])

The keys are

r3.4.bn3.weight', 'layer3.4.bn3.bias', 'layer3.4.bn3.running_mean', 'layer3.4.bn3.running_var', 'layer3.5.conv1.weight', 'layer3.5.bn1.weight', 'layer3.5.bn1.bias', 'layer3.5.bn1.running_mean', 'layer3.5.bn1.running_var', 'layer3.5.conv2.weight', 'layer3.5.bn2.weight', 'layer3.5.bn2.bias', 'layer3.5.bn2.running_mean', 'layer3.5.bn2.running_var', 'layer3.5.conv3.weight', 'layer3.5.bn3.weight', 'layer3.5.bn3.bias', 'layer3.5.bn3.running_mean', 'layer3.5.bn3.running_var', 'layer4.0.conv1.weight', 'layer4.0.bn1.weight', 'layer4.0.bn1.bias', 'layer4.0.bn1.running_mean', 'layer4.0.bn1.running_var', 'layer4.0.conv2.weight', 'layer4.0.bn2.weight', 'layer4.0.bn2.bias', 'layer4.0.bn2.running_mean', 'layer4.0.bn2.running_var', 'layer4.0.conv3.weight', 'layer4.0.bn3.weight', 'layer4.0.bn3.bias', 'layer4.0.bn3.running_mean', 'layer4.0.bn3.running_var', 'layer4.0.downsample.0.weight', 'layer4.0.downsample.1.weight', 'layer4.0.downsample.1.bias', 'layer4.0.downsample.1.running_mean', 'layer4.0.downsample.1.running_var', 'layer4.1.conv1.weight', 'layer4.1.bn1.weight', 'layer4.1.bn1.bias', 'layer4.1.bn1.running_mean', 'layer4.1.bn1.running_var', 'layer4.1.conv2.weight', 'layer4.1.bn2.weight', 'layer4.1.bn2.bias', 'layer4.1.bn2.running_mean', 'layer4.1.bn2.running_var', 'layer4.1.conv3.weight', 'layer4.1.bn3.weight', 'layer4.1.bn3.bias', 'layer4.1.bn3.running_mean', 'layer4.1.bn3.running_var', 'layer4.2.conv1.weight', 'layer4.2.bn1.weight', 'layer4.2.bn1.bias', 'layer4.2.bn1.running_mean', 'layer4.2.bn1.running_var', 'layer4.2.conv2.weight', 'layer4.2.bn2.weight', 'layer4.2.bn2.bias', 'layer4.2.bn2.running_mean', 'layer4.2.bn2.running_var', 'layer4.2.conv3.weight', 'layer4.2.bn3.weight', 'layer4.2.bn3.bias', 'layer4.2.bn3.running_mean', 'layer4.2.bn3.running_var', 'fc.weight', 'fc.bias'])

official pickle file

'res4_3_branch2c_b': array([0., 0., 0., ..., 0., 0., 0.], dtype=float32), 'res4_3_branch2b_bn_b': array([-6.11167967e-01,  2.08688632e-01, -2.49034733e-01,  2.35631034e-01,
        3.60592231e-02, -4.87970084e-01,  8.32683325e-01, -1.42867589e+00,
       -5.14522314e-01, -1.24296173e-01, -3.39232355e-01,  1.27374873e-01,
        7.29894824e-03, -2.47259624e-02, -6.74235225e-01, -5.69954403e-02,
        1.44842695e-02,  5.87425470e-01,  1.63226053e-01, -6.68697298e-01,
        6.59428716e-01, -1.00618672e+00,  1.22008853e-01, -2.47276932e-01,
       -4.95256782e-02, -1.87744901e-01, -3.83224249e-01,  1.52491868e-01,
       -1.40902913e+00, -4.12810534e-01, -2.73970310e-02,  5.01564503e-01,
        6.48054540e-01,  4.11805093e-01,  3.26660842e-01,  1.16416216e+00,
       -3.35463703e-01, -2.60218829e-01,  8.41800630e-01,  4.90527034e-01,
        3.81806940e-01,  2.98783630e-01, -2.80734509e-01,  3.38571846e-01,
       -8.87536764e-01,  2.28659719e-01,  1.05780053e+00,  2.67195702e-01,
        2.11209804e-01,  5.44941247e-01, -9.24728066e-02, -3.48747939e-01,
       -3.45722169e-01, -7.35925883e-02, -4.70263302e-01,  1.23708405e-01,
        5.85162751e-02,  3.17705065e-01,  3.56009565e-02,  3.71212602e-01,
        8.67603421e-02,  2.13111952e-01,  4.15592998e-01, -2.28238299e-01,
        3.28867957e-02,  2.53924578e-01,  8.81530166e-01, -1.57259333e+00,
        3.97879779e-01, -5.08555353e-01, -8.59989077e-02,  2.97407746e-01,
       -4.74340260e-01,  4.25474375e-01,  1.14354260e-01,  8.57465118e-02,
        8.24091494e-01,  3.46522152e-01, -4.17586893e-01, -9.87670869e-02,
       -1.72045028e+00,  3.64887267e-01,  4.44474965e-01,  3.45810711e-01,
        1.02355230e+00,  4.05820906e-02,  6.43333673e-01, -7.89172053e-01,
        2.52867490e-02,  3.89335901e-01, -9.59093153e-01, -2.51468092e-01,
        1.97788149e-01,  1.29889295e-01, -4.79187399e-01,  3.92252982e-01,

The keys are

', 'res2_0_branch2b_w', 'res4_0_branch1_bn_b', 'res4_1_branch2a_b', 'res4_0_branch2c_b', 'res4_0_branch2a_w', 'res4_1_branch2c_w', 'res2_1_branch2b_bn_s', 'res2_1_branch2b_w', 'res4_2_branch2b_w', 'res4_5_branch2c_w', 'res3_3_branch2c_bn_b', 'res5_2_branch2b_b', 'res4_5_branch2c_b', 'res4_2_branch2b_b', 'res3_3_branch2c_bn_s', 'res5_1_branch2a_w', 'res3_1_branch2a_w', 'res2_1_branch2b_b', 'res2_1_branch2b_bn_b', 'res3_1_branch2c_b', 'res4_5_branch2b_bn_b', 'res3_0_branch2b_b', 'res2_2_branch2b_b', 'res3_1_branch2a_b', 'res3_1_branch2c_w', 'res3_3_branch2a_bn_s', 'res5_1_branch2a_b', 'res3_0_branch2b_w', 'res4_5_branch2b_bn_s', 'res2_0_branch2b_bn_s', 'res5_1_branch2c_b', 'res3_0_branch2c_bn_b', 'res4_4_branch2a_b', 'res5_2_branch2c_b', 'res4_4_branch2c_bn_b', 'res2_0_branch2a_b', 'res3_3_branch2b_w', 'res5_2_branch2c_w', 'res3_0_branch2c_bn_s', 'res5_1_branch2c_w', 'res4_4_branch2a_w', 'res3_3_branch2b_b', 'res2_0_branch2a_w', 'res5_0_branch2c_w', 'res5_1_branch2b_bn_b', 'res4_1_branch2b_b', 'res4_4_branch2c_b', 'res5_2_branch2a_w', 'res4_1_branch2b_w', 'res5_0_branch2c_b', 'res5_1_branch2b_bn_s', 'res4_4_branch2c_w', 'res5_2_branch2a_b', 'res5_0_branch2a_b', 'res3_0_branch2c_w', 'res5_0_branch1_w', 'res4_5_branch2c_bn_b', 'res4_3_branch2c_w', 'res3_0_branch2a_b', 'res4_5_branch2a_bn_s', 'res4_3_branch2b_bn_s', 'res4_2_branch2c_bn_s', 'res5_0_branch1_b', 'res3_0_branch2c_b', 'res5_0_branch2a_w', 'res4_5_branch2c_bn_s', 'res4_5_branch2a_bn_b', 'res4_0_branch2b_bn_s', 'res4_3_branch2c_b', 'res4_3_branch2b_bn_b', 'res3_0_branch2a_w'])

It seems that there are tensors in my pickle file(which may be wrong?). Besides, the keys are also quite different. I am not familiar with caffe2 currently. Should I convert my pkl file to match official structure? Or is there any other solution?

Detailed steps to reproduce

Modify the weights line in yaml file as below:

WEIGHTS: /home/slashgns/detect/detectron/models/R-50.pkl

Then run the train_net.py

python tools/train_net.py     --multi-gpu-testing     --cfg configs/getting_started/tutorial_2gpu_e2e_faster_rcnn_R-50-FPN.yaml     OUTPUT_DIR output

System information