GeorgeSeif / Semantic-Segmentation-Suite

Semantic Segmentation Suite in TensorFlow. Implement, train, and test new Semantic Segmentation models easily!
2.5k stars 880 forks source link

Different results when checkpoints are restored #143

Open FSet89 opened 5 years ago

FSet89 commented 5 years ago

Describe the problem

While the validation images are correctly segmented at training time, the same images are not correctly segmented when I run the predict script (i.e. when the checkpoints are restored). I tried both the latest checkpoint and a previous one.

CJMenart commented 5 years ago

This sounds like it's almost certainly a batch norm thing...the default batch size is one, and 256x256 isn't huge, so it's possible test-time batch norm statistics aren't closely matching the normalization it does while in training. I don't think anything is wrong with the checkpoint.

I'm not the author, so, you know, but the very first thing I would try is increasing the batch size. Since these are fully convolutional models, even increasing the batch size to 4 or something might make the batch norm stable, bringing your training and testing results closer together.

ryohachiuma commented 5 years ago

What do you mean "correctly segmented"?

If the output image at the prediction stage is totally wrong,

Did you try to re-write

update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):

before loss definition at train.py? Ref: https://stackoverflow.com/questions/41666964/model-variables-in-tensorflows-batch-norm

Or you can just change is_training=False to True in predict.py (but I do not recommend to do this...)