Model accuracy with pretrained weights is always lower than reported. For vgg, it is about 69% (while reported 72.7%), and for inception_v3, it is 78% (while reported 81.3%)

Sunnydreamrain commented 8 years ago

Hi all,

As shown in the subject, I cannot get the accuracy reported by their paper with the pretrained weights, for both VGG and inception_v3 (from ImageNET classification). It is always about 3% lower. For vgg, it is about 69% (while reported 72.7%), and for inception_v3, it is about 78% (while reported 81.3%).

It is driving me crazy. Any advice would be highly appreciated.

My setting, 1, Data is preprocessed using caffe to produce lmdb. It is resized to 256 for vgg and 384 for inception. 2, When testing, the data is oversampled with 10 crops (4 corners with center crop, plus flip). This is slightly different from the paper, but it should not cause 3% difference. It only improves about 1% using oversample compared to using the center crop. Oversample code is attached at the end. 3, The pretrained weights are downloaded from the model zoo. 4, Model is tested using GPU with theano configuration as THEANO_FLAGS='floatX=float32,device=gpu0,mode=FAST_RUN, nvcc.fastmath=True'.

Oversampling code:

        datum.ParseFromString(value)
        label = datum.label
        img = np.array(bytearray(datum.data)).reshape(datum.channels, datum.height, datum.width)
        for oversamplei in range(5):
            dx=self.cropindex[oversamplei][0]
            dy=self.cropindex[oversamplei][1]
            tempimg = img[:,dy:dy+self.crop_height,dx:dx+self.crop_width]
            for flipi in range(2):
                if flipi==1:
                    tempimg = tempimg[:,:,::-1]         

                self.data_batches[i*10+oversamplei*2+flipi,:,:,:] = tempimg-BGR_mean
                self.labels_batches[i*10+oversamplei*2+flipi] = np.int32(label)

Test code:

test_vggprediction = lasagne.layers.get_output(vggmodel['prob'], X_sym, deterministic=True)
_,vggprediction_shape=test_vggprediction.shape
temptest_vggprediction=test_vggprediction.reshape((-1,10,vggprediction_shape))
lable_oversample=y_sym[::10]
test_vggprediction_oversample=T.mean(temptest_vggprediction,axis=1,dtype=theano.config.floatX)
test_vggacc=T.mean(lasagne.objectives.categorical_accuracy(test_vggprediction_oversample, lable_oversample, top_k=1),dtype=theano.config.floatX)
test_vggacc_top5=T.mean(lasagne.objectives.categorical_accuracy(test_vggprediction_oversample, lable_oversample, top_k=5),dtype=theano.config.floatX)

Again, any advice would be highly appreciated.

mattroos commented 7 years ago

Where is the 72.7% acccuracy number for VGG-S (CNN-S) reported? In both the Chatfield paper and the website I only see Top-5 error for ILSVRC-2012-val reported (13.1%). What data set were you testing with? Was it ILSVRC-2012-val? I'd certainly like to know what performance has been measured by others with this network on the 2012 train, test, or validation sets.

mattroos commented 7 years ago

I got Top-1 error of 40.82% and Top-5 error of 18.21%. This was for the ILSVRC-2012 validation set. I resized all images to 224x224 without cropping. Resizing proportionally and then cropping just the height or width is likely to give slightly better scores, as is augmentation via horizontal flipping and multiple crops and then averaging softmax outputs for the augmented images.

Sunnydreamrain commented 7 years ago

Following is the paper. https://arxiv.org/pdf/1409.1556.pdf

I found that resizing all images to a fixed size introduced the difference. It should be resizing the image to 224 X or X 224 where the smallest side (height or width) should be resized to 224 and the original ratio of height/width should be kept unchanged.

f0k commented 7 years ago

I found that resizing all images to a fixed size introduced the difference.

Thanks for reporting back! So this issue has been resolved and can be closed?

Sunnydreamrain commented 7 years ago

Yes. I forgot to close the issue. Sorry about this.

Lasagne / Recipes

Model accuracy with pretrained weights is always lower than reported. For vgg, it is about 69% (while reported 72.7%), and for inception_v3, it is 78% (while reported 81.3%) #80