Open FSet89 opened 5 years ago
This sounds like it's almost certainly a batch norm thing...the default batch size is one, and 256x256 isn't huge, so it's possible test-time batch norm statistics aren't closely matching the normalization it does while in training. I don't think anything is wrong with the checkpoint.
I'm not the author, so, you know, but the very first thing I would try is increasing the batch size. Since these are fully convolutional models, even increasing the batch size to 4 or something might make the batch norm stable, bringing your training and testing results closer together.
What do you mean "correctly segmented"?
If the output image at the prediction stage is totally wrong,
Did you try to re-write
update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(update_ops):
before loss definition at train.py? Ref: https://stackoverflow.com/questions/41666964/model-variables-in-tensorflows-batch-norm
Or you can just change is_training=False to True in predict.py (but I do not recommend to do this...)
What are your command line arguments?: python predict.py --image /path/to/image --checkpoint_path /path/to/checkpoint --model FRRN-A --dataset --crop_width 256 --crop_height 256
Have you written any custom code?: I added
is_training=is_training
to the batch_norm parameters in the model builder, which is True during training and False during predictionWhat have you done to try and solve this issue?: I tried to change the image loading pipeline without success
TensorFlow version?: 1.7.0
Describe the problem
While the validation images are correctly segmented at training time, the same images are not correctly segmented when I run the predict script (i.e. when the checkpoints are restored). I tried both the latest checkpoint and a previous one.