Training and Testing does not perform the same

zylou commented 7 years ago

I am trying to train a model for sample binary image segmentation. I use the third command in the tutorial for training which mean finetuning from VGG As following:

./SegNet/caffe-segnet/build/tools/caffe train -gpu 0 -solver ./SegNet/models/segnet_solver.prototxt -weights ./SegNet/models/VGG_ILSVRC_16_layers.caffemodel

The train.prototxt looks like this:

name: "VGG_ILSVRC_16_layer"
layer {
  name: "data"
  type: "DenseImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  dense_image_data_param {
    source: "/home/deeplearning/BinarySegmentation/SegNet/images_train/Image_list_files/image_list_train.txt"   # Change this to the absolute path to your data file
    batch_size: 3               # Change this number to a batch size that will fit on your GPU
    shuffle: true
  }
}
layer {
  name: "data"
  type: "DenseImageData"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  dense_image_data_param {
    source: "/home/deeplearning/BinarySegmentation/SegNet/images_test/Image_list_files/image_list_test.txt" # Change this to the absolute path to your data file
    batch_size: 1               # Change this number to a batch size that will fit on your GPU
    shuffle: false
  }
}
...
...
...
...
...
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "conv1_1_D"
  bottom: "label"
  top: "loss"
  softmax_param {engine: CAFFE}
  loss_param: {
    weight_by_label_freqs:true
    ignore_label: 11
    class_weighting: 0.3
    class_weighting: 1.7
  }
  include {
    phase: TRAIN
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "conv1_1_D"
  bottom: "label"
  top: "accuracy"
  top: "per_class_accuracy"
  include {
    phase: TRAIN
  }
}
layer {
  name: "accuracy_test"
  type: "Accuracy"
  bottom: "conv1_1_D"
  bottom: "label"
  top: "accuracy_test"
  top: "per_class_accuracy_test"
  include {
    phase: TEST
  }
}

In the image_list_train.txt and image_list_test.txt, I put each line the orignial image and the mask image. such as: /home/deeplearning/BinarySegmentation/SegNet/images_train/Resized_images/RGB_0001.png /media/nase452bd/_80_User/Lou/SemanticSegmentation/FlyingPallet/Resized_masks/RGB_0001_Mask.png

The training is quite successful. In the end of the training, the accuracy of both training and testing is very high. As this:

I0830 16:40:19.594223  4132 solver.cpp:294] Iteration 40000, Testing net (#0)
I0830 16:40:20.516127  4132 solver.cpp:343]     Test net output #0: accuracy_test = 0.986257
I0830 16:40:20.516177  4132 solver.cpp:343]     Test net output #1: per_class_accuracy_test = 0.990675
I0830 16:40:20.516186  4132 solver.cpp:343]     Test net output #2: per_class_accuracy_test = 0.958876

However, when I use the trained model to predict the testing images (which is also used for testing during training). The model predict all pixels into category 1. It does not consistent with the performance during training. Is anyone have any clue how this happens? Thanks a lot.

zylou commented 7 years ago

Anyone has any idea why it has high accuracy for both train and test during training. But the model predict all pixels into category 1 if I use the trained model after training.

zhenzhen1022 commented 7 years ago

I have the same problem with you! Have you ever solved this? When I use the .caffemodel I trained, all the predict colour is grey or red, could anyone tell me whether it is the problem of class_weighting?

Jinming-Su commented 7 years ago

Me too. Help!

Jinming-Su commented 7 years ago

@nk1001001001001 Thank you! But I think the median freq and freq(c) is not easy to compute. And I get a result although ordinary, only by computing batch normalization. I think compute_bn_statistics.py is important.

l0g1x commented 7 years ago

I dont have a solution to the issue, however, sometime ago i did narrow down to how to reproduce it. I had it working fine originally on a Quadro Mxxxx gpu thats more or less 5 years old, then a gtx1070 (also fine), until i switched to a 1080. Thats when i got a solid class color when visualizing the inference output. If i switched back to the 1070, it would work again.

anasgit commented 6 years ago

There are tow possibilities for this kind of problems. First, verify the output layer in your Inference. Second, Verify that your Inference network model is corresponding to your training one.

alexgkendall / SegNet-Tutorial

Training and Testing does not perform the same #61