Evaluating on same .ckpt file but get different mAP

Describe the bug I change the backbone of SSD and train a model with final mAP of 70.8%(according to voc_eval.py). However, when I evaluate the same .ckpt file again, the mAP is slightly different from the results of last test.

I have thought before that the difference is caused by restoring .ckpt file from FLAGS.save_path in the function __restore_model of full_precision/learner, so I change the path to FLAGS.save_path_eval. And the mAP change still exist.

def __restore_model(self, is_train):
    """Restore a model from the latest checkpoint files.

    Args:
    * is_train: whether to restore a model for training
    """
    # save_path = tf.train.latest_checkpoint(os.path.dirname(FLAGS.save_path))
    save_path = tf.train.latest_checkpoint(os.path.dirname(FLAGS.save_path_eval))
    if is_train:
        self.saver_train.restore(self.sess_train, save_path)
    else:
        self.saver_eval.restore(self.sess_eval, save_path)
    tf.logging.info('model restored from ' + save_path)

To Reproduce Run python nets/xxxnet_at_pascalvoc_run.py --image_size xxx --data_dir_local /xxx/xxx --exec_mode eval twice. And the lower the mAP is, the difference is more obvious.

Expected behavior mAP should have been exactly the same when evaluating on the same .ckpt file.

Screenshots 1. use the final .ckpt file for evaluation first evaluation:

INFO:tensorflow:building forward with is_train = False
INFO:tensorflow:Restoring parameters from ./models_eval/model.ckpt
INFO:tensorflow:model restored from ./models_eval/model.ckpt
VOC07 metric? Yes
Reading annotation for 1/4952
Reading annotation for 101/4952
...
Reading annotation for 4901/4952
Saving cached annotations to ./ssd_outputs/eval_cache/annots.pkl
AP for aeroplane = 0.7368
AP for bicycle = 0.8121
AP for bird = 0.6879
AP for boat = 0.6012
AP for bottle = 0.3778
AP for bus = 0.8022
AP for car = 0.7982
AP for cat = 0.8351
AP for chair = 0.5095
AP for cow = 0.6873
AP for diningtable = 0.7063
AP for dog = 0.8056
AP for horse = 0.8334
AP for motorbike = 0.8023
AP for person = 0.7378
AP for pottedplant = 0.4416
AP for sheep = 0.7069
AP for sofa = 0.7683
AP for train = 0.8461
AP for tvmonitor = 0.6730
Mean AP = 0.7085

second evaluation:

INFO:tensorflow:building forward with is_train = False
INFO:tensorflow:Restoring parameters from ./models_eval/model.ckpt
INFO:tensorflow:model restored from ./models_eval/model.ckpt
VOC07 metric? Yes
Reading annotation for 1/4952
Reading annotation for 101/4952
Reading annotation for 201/4952
...
Reading annotation for 4901/4952
Saving cached annotations to ./ssd_outputs/eval_cache/annots.pkl
AP for aeroplane = 0.7377
AP for bicycle = 0.8124
AP for bird = 0.6896
AP for boat = 0.6014
AP for bottle = 0.3781
AP for bus = 0.8024
AP for car = 0.7973
AP for cat = 0.8349
AP for chair = 0.5099
AP for cow = 0.6873
AP for diningtable = 0.7062
AP for dog = 0.8055
AP for horse = 0.8333
AP for motorbike = 0.8021
AP for person = 0.7378
AP for pottedplant = 0.4411
AP for sheep = 0.7060
AP for sofa = 0.7682
AP for train = 0.8458
AP for tvmonitor = 0.6730
Mean AP = 0.7085

2. when mAP is lower (in early stage of training) first evaluation:

INFO:tensorflow:building forward with is_train = False
INFO:tensorflow:Restoring parameters from ./models_eval/model.ckpt
INFO:tensorflow:model restored from ./models_eval/model.ckpt
VOC07 metric? Yes
Reading annotation for 1/4952
Reading annotation for 101/4952
...
Reading annotation for 4901/4952
Saving cached annotations to ./ssd_outputs/eval_cache/annots.pkl
AP for aeroplane = 0.0140
AP for bicycle = 0.0204
AP for bird = 0.0069
AP for boat = 0.0007
AP for bottle = 0.0000
AP for bus = 0.0273
AP for car = 0.1313
AP for cat = 0.1234
AP for chair = 0.0004
AP for cow = 0.0079
AP for diningtable = 0.0042
AP for dog = 0.0263
AP for horse = 0.0394
AP for motorbike = 0.0515
AP for person = 0.0633
AP for pottedplant = 0.0004
AP for sheep = 0.0011
AP for sofa = 0.0101
AP for train = 0.0279
AP for tvmonitor = 0.0131
Mean AP = 0.0285

second evaluation:

INFO:tensorflow:building forward with is_train = False
INFO:tensorflow:Restoring parameters from ./models_eval/model.ckpt
INFO:tensorflow:model restored from ./models_eval/model.ckpt
VOC07 metric? Yes
Reading annotation for 1/4952
Reading annotation for 101/4952
...
Reading annotation for 4901/4952
Saving cached annotations to ./ssd_outputs/eval_cache/annots.pkl
AP for aeroplane = 0.0172
AP for bicycle = 0.0204
AP for bird = 0.0069
AP for boat = 0.0007
AP for bottle = 0.0000
AP for bus = 0.0273
AP for car = 0.1056
AP for cat = 0.1697
AP for chair = 0.0004
AP for cow = 0.0084
AP for diningtable = 0.0042
AP for dog = 0.0274
AP for horse = 0.0394
AP for motorbike = 0.0515
AP for person = 0.0575
AP for pottedplant = 0.0004
AP for sheep = 0.0011
AP for sofa = 0.0101
AP for train = 0.0279
AP for tvmonitor = 0.0131
Mean AP = 0.0295

P.S. If I only run voc_eval instead of python nets/xxxnet_at_pascalvoc_run.py --image_size xxx --data_dir_local /xxx/xxx --exec_mode eval, the results will be exactly the same no matter how many times, so the eval script is correct.

Desktop (please complete the following information):

OS: Ubuntu 16.04
Browser [e.g. chrome, safari]
Version [e.g. 22]

Smartphone (please complete the following information):

Device: [e.g. iPhone6]
OS: [e.g. iOS8.1]
Browser [e.g. stock browser, safari]
Version [e.g. 22]

Additional context Add any other context about the problem here.

I really have no idea what the problem is, expect for your reply.

Tencent / PocketFlow

Evaluating on same .ckpt file but get different mAP #286