balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow
4.11k stars 1.89k forks source link

some bug in convert caffemodel to tensorflow #175

Open jinxuan777 opened 6 years ago

jinxuan777 commented 6 years ago

I trained a caffemodel with caffe (https://github.com/weiliu89/caffe/tree/ssd) and got map=77.8% in pascal VOC; then I transfer caffemodel to ckpt with caffe_to_tensorflow.py and only got map=67% image maybe some tips? @balancap

jinxuan777 commented 6 years ago

('Loading Caffe file:', '/SSD-Tensorflow/caffemodel/VGG_VOC0712_SSD_300x300_iter_120000.caffemodel') ('Convert BGR to RGB in convolution layer:', u'conv1_1') ('Load weights from convolution layer:', u'conv1_1', (3, 3, 3, 64)) ('Load biases from convolution layer:', u'conv1_1', (64,)) ('Load weights from convolution layer:', u'conv1_2', (3, 3, 64, 64)) ('Load biases from convolution layer:', u'conv1_2', (64,)) ('Load weights from convolution layer:', u'conv2_1', (3, 3, 64, 128)) ('Load biases from convolution layer:', u'conv2_1', (128,)) ('Load weights from convolution layer:', u'conv2_2', (3, 3, 128, 128)) ('Load biases from convolution layer:', u'conv2_2', (128,)) ('Load weights from convolution layer:', u'conv3_1', (3, 3, 128, 256)) ('Load biases from convolution layer:', u'conv3_1', (256,)) ('Load weights from convolution layer:', u'conv3_2', (3, 3, 256, 256)) ('Load biases from convolution layer:', u'conv3_2', (256,)) ('Load weights from convolution layer:', u'conv3_3', (3, 3, 256, 256)) ('Load biases from convolution layer:', u'conv3_3', (256,)) ('Load weights from convolution layer:', u'conv4_1', (3, 3, 256, 512)) ('Load biases from convolution layer:', u'conv4_1', (512,)) ('Load weights from convolution layer:', u'conv4_2', (3, 3, 512, 512)) ('Load biases from convolution layer:', u'conv4_2', (512,)) ('Load weights from convolution layer:', u'conv4_3', (3, 3, 512, 512)) ('Load biases from convolution layer:', u'conv4_3', (512,)) ('Load weights from convolution layer:', u'conv5_1', (3, 3, 512, 512)) ('Load biases from convolution layer:', u'conv5_1', (512,)) ('Load weights from convolution layer:', u'conv5_2', (3, 3, 512, 512)) ('Load biases from convolution layer:', u'conv5_2', (512,)) ('Load weights from convolution layer:', u'conv5_3', (3, 3, 512, 512)) ('Load biases from convolution layer:', u'conv5_3', (512,)) ('Load weights from convolution layer:', u'fc6', (3, 3, 512, 1024)) ('Load biases from convolution layer:', u'fc6', (1024,)) ('Load weights from convolution layer:', u'fc7', (1, 1, 1024, 1024)) ('Load biases from convolution layer:', u'fc7', (1024,)) ('Load weights from convolution layer:', u'conv6_1', (1, 1, 1024, 256)) ('Load biases from convolution layer:', u'conv6_1', (256,)) ('Load weights from convolution layer:', u'conv6_2', (3, 3, 256, 512)) ('Load biases from convolution layer:', u'conv6_2', (512,)) ('Load weights from convolution layer:', u'conv7_1', (1, 1, 512, 128)) ('Load biases from convolution layer:', u'conv7_1', (128,)) ('Load weights from convolution layer:', u'conv7_2', (3, 3, 128, 256)) ('Load biases from convolution layer:', u'conv7_2', (256,)) ('Load weights from convolution layer:', u'conv8_1', (1, 1, 256, 128)) ('Load biases from convolution layer:', u'conv8_1', (128,)) ('Load weights from convolution layer:', u'conv8_2', (3, 3, 128, 256)) ('Load biases from convolution layer:', u'conv8_2', (256,)) ('Load weights from convolution layer:', u'conv9_1', (1, 1, 256, 128)) ('Load biases from convolution layer:', u'conv9_1', (128,)) ('Load weights from convolution layer:', u'conv9_2', (3, 3, 128, 256)) ('Load biases from convolution layer:', u'conv9_2', (256,)) ('Load scaling from L2 normalization layer:', u'conv4_3_norm', (512,)) ('Load weights from convolution layer:', u'conv4_3_norm_mbox_loc', (3, 3, 512, 16)) ('Load biases from convolution layer:', u'conv4_3_norm_mbox_loc', (16,)) ('Load weights from convolution layer:', u'conv4_3_norm_mbox_conf', (3, 3, 512, 84)) ('Load biases from convolution layer:', u'conv4_3_norm_mbox_conf', (84,)) ('Load weights from convolution layer:', u'fc7_mbox_loc', (3, 3, 1024, 24)) ('Load biases from convolution layer:', u'fc7_mbox_loc', (24,)) ('Load weights from convolution layer:', u'fc7_mbox_conf', (3, 3, 1024, 126)) ('Load biases from convolution layer:', u'fc7_mbox_conf', (126,)) ('Load weights from convolution layer:', u'conv6_2_mbox_loc', (3, 3, 512, 24)) ('Load biases from convolution layer:', u'conv6_2_mbox_loc', (24,)) ('Load weights from convolution layer:', u'conv6_2_mbox_conf', (3, 3, 512, 126)) ('Load biases from convolution layer:', u'conv6_2_mbox_conf', (126,)) ('Load weights from convolution layer:', u'conv7_2_mbox_loc', (3, 3, 256, 24)) ('Load biases from convolution layer:', u'conv7_2_mbox_loc', (24,)) ('Load weights from convolution layer:', u'conv7_2_mbox_conf', (3, 3, 256, 126)) ('Load biases from convolution layer:', u'conv7_2_mbox_conf', (126,)) ('Load weights from convolution layer:', u'conv8_2_mbox_loc', (3, 3, 256, 16)) ('Load biases from convolution layer:', u'conv8_2_mbox_loc', (16,)) ('Load weights from convolution layer:', u'conv8_2_mbox_conf', (3, 3, 256, 84)) ('Load biases from convolution layer:', u'conv8_2_mbox_conf', (84,)) ('Load weights from convolution layer:', u'conv9_2_mbox_loc', (3, 3, 256, 16)) ('Load biases from convolution layer:', u'conv9_2_mbox_loc', (16,)) ('Load weights from convolution layer:', u'conv9_2_mbox_conf', (3, 3, 256, 84)) ('Load biases from convolution layer:', u'conv9_2_mbox_conf', (84,)) logs is normal. @balancap

athindran commented 6 years ago

I am suffering from the same problem. The caffe model gave an mAP of 77.x but the converted TF model gave an mAP of 59.x.

pierluigiferrari commented 6 years ago

One possible cause of this problem might be differences in how Caffe and TensorFlow do padding and in how they round in the pooling layers for uneven feature map dimensions. That is, the problem might not necessarily be with the weights, but rather with the TensorFlow model definition.

I ran into similar problems at first when I created this Keras port of SSD. I found that if the data flow in the forward pass is off by even just one pixel anywhere along the way (e.g. because of slight differences in the padding or pooling behavior), it affects the mAP dramatically.