Issue running with mut1ny dataset

Please fill out this issue template before submitting. Issues which do not fill out this template, or are already answered in the FAQs will simply be closed.

Please go to Stack Overflow for help and support. Also check past issues as many are repeats. Also check out the Frequently Asked Questions (FAQs) below in case your question has already been answered in an issue!

Issues should be one of the following:

Feature requests
Bug reports

Information

Please specify the following information when submitting an issue:

What are your command line arguments?: python train.py --num_epochs 300 --crop_height 256 --crop_width 256 --model MobileUNet-Skip --frontend MobileNetV2 --dataset mut1ny python test.py --checkpoint_path checkpoints/latest_model_MobileUNet-Skip_mut1ny.ckpt --dataset mut1ny --model MobileUNet-Skip --crop_height 256 --crop_width 256
Have you written any custom code?: no
What have you done to try and solve this issue?: change folder, check ground truth labels
TensorFlow version?: tensorflow-gpu 1.12.0

Describe the problem

We are trying to train a MobileUNet-Skip segmentation model trained on the mut1ny dataset, an example of an image and the ground truth labels is shown here:

Image:

Label:

When we train we get spikes in the accuracy:

Accuracy: accuracy_vs_epochs

When we run a test the outputted ground truth label colors all labels as eyes:

Test image:

Real data set label: image1313_l

Outputted ground truth from the test script: image1313_gt

Outputted segmentation from test script: image1313_pred

We can't figure out why output from the test script colors all the labels as a single color (green)

Source code / logs

Here is class_dict.csv CSV:

name,r,g,b Eyes,117,250,76 Lips,234,51,35 Nose,0,32,245 Ears,116,251,253 Hair,255,253,84 Teeth,255,255,255 Eyebrows,234,61,247 Beard,245,194,193 Face,128,128,128 Void,0,0,0

Here is the output from the test command:

Retrieving dataset information ... 2019-02-11 16:27:49.644257: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-02-11 16:27:49.914821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235 pciBusID: 0000:04:00.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-02-11 16:27:49.914868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0 2019-02-11 16:27:50.551338: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-02-11 16:27:50.551394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0 2019-02-11 16:27:50.551402: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N 2019-02-11 16:27:50.551744: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10712 MB memory) -> physical GPU (device: 0, name: Tesla K80, pci bus id: 0000:04:00.0, compute capability: 3.7) Preparing the model ... Loading model checkpoint weights ... Loading the data ... Running test image 1 / 1/HD1Data/obscurenet/on-env/lib/python3.5/site-packages/sklearn/metrics/classification.py:1145: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for) /HD1Data/obscurenet/on-env/lib/python3.5/site-packages/sklearn/metrics/classification.py:1145: UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no true samples. 'recall', 'true', average, warn_for) Average test accuracy = 0.684173583984375 Average per class test accuracies =

Eyes = 0.000000 Lips = 1.000000 Nose = 1.000000 Ears = 1.000000 Hair = 1.000000 Teeth = 1.000000 Eyebrows = 1.000000 Beard = 1.000000 Face = 0.874889 Void = 0.762414 Average precision = 0.8036244472448342 Average recall = 0.684173583984375 Average F1 score = 0.7325941845711267 Average mean IoU score = 0.3793640074619333 Average run time = 1.4668142795562744

FAQs

Question: I got an InvalidArgumentError saying that Dimensions of inputs should match Answer: See issue #17
Question: Can you upload pre-trained weights for these networks? Answer: See issue #57
Question: Do I need a GPU to train these models? Answer: Technically no, but I'd highly recommend it. I was able to train the models pretty well in about a day using a 1080Ti GPU. Training on CPU would take much longer than that.
Question: Will you be adding the FCN or U-Net models? Answer: No I won't be adding those simply because they're a few years old and state-of-the-art has moved past that.
Question: I got an invalid argument error when using the InceptionV4 model. Am I doing something wrong? Answer: No you're not! Due to the design of the InceptiveV4 model, when you end up upsampling you do some rounding which creates a shape mismatch. _This only happens when you end up having to use the end_points['pool5']_. See the code for some of the models if you want to check whether the model will use end_points['pool5'].

GeorgeSeif / Semantic-Segmentation-Suite