Multiclass Object detection prediction always returns the same class

YassineElbouchaibi commented 4 years ago

Multiclass Object detection prediction always returns the same class but different after each training.. Let me clarify.

So I have a dataset and here is its head :

+--------------------+------------------------+-------------------------------+
|        path        |         image          |           annotation          |
+--------------------+------------------------+-------------------------------+
| new_data/00009.jpg | Height: 512 Width: 384 | [{'label': 'Clear plastic ... |
| new_data/00011.jpg | Height: 512 Width: 384 | [{'label': 'Glass bottle',... |
| new_data/00028.jpg | Height: 512 Width: 384 | [{'label': 'Clear plastic ... |
| new_data/00034.jpg | Height: 384 Width: 512 | [{'label': 'Food Can', 'co... |
| new_data/00090.jpg | Height: 384 Width: 512 | [{'label': 'Meal carton', ... |
| new_data/00095.jpg | Height: 512 Width: 384 | [{'label': 'Plastic bottle... |
| new_data/00097.jpg | Height: 310 Width: 512 | [{'label': 'Drink can', 'c... |
| new_data/00098.jpg | Height: 512 Width: 384 | [{'label': 'Styrofoam piec... |
| new_data/00104.jpg | Height: 233 Width: 512 | [{'label': 'Unlabeled litt... |
| new_data/00110.jpg | Height: 512 Width: 233 | [{'label': 'Plastic film',... |
+--------------------+------------------------+-------------------------------+

Here is one exemple of annotations :

[{
    'coordinates': {
        'height': 85,
        'width': 88,
        'x': 283,
        'y': 208
    },
    'label': 'Meal carton'
},
{
    'coordinates': { 'height': 52, 'width': 109, 'x': 111, 'y': 108 },
    'label': 'Clear plastic bottle'
},
{
    'coordinates': { 'height': 15, 'width': 10, 'x': 161, 'y': 89 },
    'label': 'Plastic bottle cap'
},
{
    'coordinates': { 'height': 33, 'width': 72, 'x': 393, 'y': 136 },
    'label': 'Drink can'
},
{
    'coordinates': { 'height': 46, 'width': 44, 'x': 490, 'y': 47 },
    'label': 'Disposable food container'
},
{
    'coordinates': { 'height': 41, 'width': 65, 'x': 413, 'y': 153 },
    'label': 'Normal paper'
},
{
    'coordinates': { 'height': 13, 'width': 21, 'x': 438, 'y': 132 },
    'label': 'Unlabeled litter'
},
{
    'coordinates': { 'height': 18, 'width': 16, 'x': 448, 'y': 27 },
    'label': 'Unlabeled litter'
},
{
    'coordinates': { 'height': 7, 'width': 14, 'x': 81, 'y': 81 },
    'label': 'Cigarette'
},
{
    'coordinates': { 'height': 3, 'width': 6, 'x': 144, 'y': 56 },
    'label': 'Cigarette'
},
{
    'coordinates': { 'height': 4, 'width': 10, 'x': 44, 'y': 15 },
    'label': 'Cigarette'
},
{
    'coordinates': { 'height': 5, 'width': 9, 'x': 114, 'y': 247 },
    'label': 'Cigarette'
}]

The corresponding image (labeled) is this one :

But after training an Object detection model here is what I get :

In fact, Broken glass is the only label I get : ...

So I verified model.classes and this is what I got : (the right thing)

['Aluminium foil',
 'Broken glass',
 'Cigarette',
 'Clear plastic bottle',
 'Corrugated carton',
 'Crisp packet',
 'Disposable food container',
 'Disposable plastic cup',
 'Drink can',
 'Drink carton',
 'Egg carton',
 'Foam cup',
 'Foam food container',
 'Food Can',
 'Garbage bag',
 'Glass bottle',
 'Meal carton',
 'Metal bottle cap',
 'Normal paper',
 'Other carton',
 'Other plastic',
 'Other plastic bottle',
 'Other plastic container',
 'Other plastic wrapper',
 'Paper cup',
 'Plastic bottle cap',
 'Plastic film',
 'Plastic lid',
 'Plastic straw',
 'Plastic utensils',
 'Pop tab',
 'Rope & strings',
 'Single-use carrier bag',
 'Styrofoam piece',
 'Tissues',
 'Tupperware',
 'Unlabeled litter',
 'Wrapping paper']

I also verified model.summary : (Everything seems legit)

<bound method Model.summary of Class                                    : ObjectDetector

Schema
------
Model                                    : darknet-yolo
Number of classes                        : 38
Input image shape                        : [3, 416, 416]

Training summary
----------------
Training time                            : 1h 1m
Training epochs                          : 810
Training iterations                      : 2000
Number of examples (images)              : 79
Number of bounding boxes (instances)     : 245
Final loss (specific to model)           : 1.2684

I would normally train for longer but limited the training because this problem happens even if I train for around 6-7 hours.

I work on google colab and here is a bit of my code :

!pip install turicreate
!pip uninstall -y mxnet
!pip install mxnet-cu100==1.5.1
!pip uninstall -y tensorflow
!pip install tensorflow-gpu==2.0.0

import coremltools
import os
import turicreate as tc
tc.config.set_num_gpus(-1)

images = tc.load_images('new_data/', ignore_failure=True, recursive=True)
annotations = tc.SFrame("./annotations.csv")
data = images.join(annotations)
train_data, test_data = data.random_split(0.05) # 0.05 is for testing purposes
model = tc.object_detector.create(train_data, batch_size=32)

Here is the output I get before all the iterations :

Using 'image' as feature column
Using 'annotation' as annotations column
Downloading https://docs-assets.developer.apple.com/turicreate/models/darknet.params
Download completed: /var/tmp/model_cache/darknet.params
Downloading https://docs-assets.developer.apple.com/turicreate/models/darknet.mlmodel
Download completed: /var/tmp/model_cache/darknet.mlmodel
Using 1 GPU to create model (CUDA)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Setting 'max_iterations' to 2000

In conclusion everything seems normal but I only get one class for all my predictions afterwards.

Thanks guys!

TobyRoseman commented 4 years ago

79 training images is a very small amount of data for trying to train an object detector. This particularly true if you have 38 classes. You're probably going to need a lot more training data in order to get an effective model. This is likely the reason for the issues you are having.

During training, what happens to the loss? Is it generally decreasing? Is it decreasing for awhile and then going back up?

YassineElbouchaibi commented 4 years ago

@TobyRoseman I am training with only 79 images because i'm kind of debugging... I already trained with 1500 images, ~4500 bbox, 59 classes for about 12 hours and the same thing was happening so I want to save time debugging.

Regarding the loss, it starts at ~14 and ends at ~1 so generally decreasing from start to end

Is any character forbidden from being used as a label ? Is it ok if my bbox x, y are floats with 1 decimal (e.g 45.5)

TobyRoseman commented 4 years ago

@TobyRoseman I am training with only 79 images because i'm kind of debugging... I already trained with 1500 images, ~4500 bbox, 59 classes for about 12 hours and the same thing was happening so I want to save time debugging.

Ok, that sounds good.

Regarding the loss, it starts at ~14 and ends at ~1 so generally decreasing from start to end

That sound reasonable to me.

Is any character forbidden from being used as a label ?

There are no forbidden characters for the labels.

Is it ok if my bbox x, y are floats with 1 decimal (e.g 45.5)

Floats should be fine.

A few questions: 1 - What operating system are you using to train your model? 2 - What is your label distribution like? Are there a lot more bounding boxes for one label? Is this the label that is always getting predicted? 3 - Try passing in max_iterations=1 into create. Does that model always predict the same label?

YassineElbouchaibi commented 4 years ago

Thanks for the support by the way!

1 - What operating system are you using to train your model?

I work on Google Colab and here is the os info :

NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

2 - What is your label distribution like? Are there a lot more bounding boxes for one label? Is this the label that is always getting predicted?

As you can see from the graph below, the label distribution is not that great, however the one with the most bboxes is not the one that is always predicted. It seems to always be the one at index 1 of my class list (from model.classes).

3 - Try passing in max_iterations=1 into create. Does that model always predict the same label?

So max_iterations=1 was too low and the loss was too high which was leading to no predictions. I ran it a couple of times with this max_iterations=100 instead :

model = tc.object_detector.create(train_data, batch_size=16, max_iterations=100)

And the hypothesis mentionned above seems to be indeed whats happening. The predicted labels seem to always be the one at index 1 of my class list (from model.classes). In this case I was always getting 'Aluminium blister pack' and model.classes was :

['Aerosol',
 'Aluminium blister pack',
 'Aluminium foil',
 'Battery',
 'Broken glass',
 'Carded blister pack',
 'Cigarette',
 'Clear plastic bottle',
 'Corrugated carton',
 'Crisp packet',
 'Disposable food container',
 'Disposable plastic cup',
 'Drink can',
 'Drink carton',
 'Egg carton',
 'Foam cup',
 'Foam food container',
 'Food Can',
 'Food waste',
 'Garbage bag',
 'Glass bottle',
 'Glass cup',
 'Glass jar',
 'Magazine paper',
 'Meal carton',
 'Metal bottle cap',
 'Metal lid',
 'Normal paper',
 'Other carton',
 'Other plastic',
 'Other plastic bottle',
 'Other plastic container',
 'Other plastic cup',
 'Other plastic wrapper',
 'Paper bag',
 'Paper cup',
 'Paper straw',
 'Pizza box',
 'Plastic bottle cap',
 'Plastic film',
 'Plastic glooves',
 'Plastic lid',
 'Plastic straw',
 'Plastic utensils',
 'Polypropylene bag',
 'Pop tab',
 'Rope and strings',
 'Scrap metal',
 'Shoe',
 'Single-use carrier bag',
 'Six pack rings',
 'Spread tub',
 'Squeezable tube',
 'Styrofoam piece',
 'Tissues',
 'Toilet tube',
 'Tupperware',
 'Unlabeled litter',
 'Wrapping paper']

YassineElbouchaibi commented 4 years ago

@TobyRoseman So I ran some tests, I limited my self to 10 images and their labels, and trained 3 models, one on Turicreate 5.8, one on 6.0 and one on 6.1 and this behavior seems to happen only on 6.X. However, on 5.8, if the iteration number is too high, the training freezes after a while and nothing gets printed in the output so i need to kill the process and lower max_iterations.

YassineElbouchaibi commented 4 years ago

So I just ran a 3 hours training (batch_size=16, max_iterations=22000) with the full dataset, turicreate v5.8, tensorflow-gpu v2.0, same OS as before (Ubuntu 18.04.3 LTS (Bionic Beaver)) and the model works as intented (different classes are correctly predicted). Here are some examples of prediction :

So this is definitely the workaround l'll be using. Also, this may suggest tc v6.0 or one of its new dependencies has a breaking change.

TobyRoseman commented 4 years ago

@YassineElbouchaibi - thanks for the update. I'm glad 5.8 is working for you.

TuriCreate 6.0 was a major update. We moved the Object Detector (and all of our other deep learning toolkits) from using MXNet to TensorFlow.

I can't reproduce this issue. I used a two class dataset and trained an Object detector models on Ubuntu. Predictions from this model contain both classes.

YassineElbouchaibi commented 4 years ago

I strongly think this issue only happens when you have 3 classes or more... I’ll try testing tc v6.1 with a 3 class dataset I found and also with a 2 class dataset and see if my hypothesis is true. If it’s the case I’ll leave a link to an interactive python notebook for the reproduction.

TobyRoseman commented 4 years ago

@YassineElbouchaibi - My apologies for not responding sooner. I finally trained an object detection model, on Linux, with a dataset containing more than two classes. I agree there is a serious issue here.

On Linux, I trained an object detection model using a dataset with 6 classes, 24 image and 103 bounding boxes. After 2,000 iterations, I examined predictions on the train set. It produces roughly the right number of bounding box predictions, but the labels for all predictions are always only one of two classes, with most being from just one class. I get similar results with validation data.

On macOS, the same code and same dataset produce reasonable looking results (i.e. class predictions have roughly the right distribution).

We'll investigate this issue further.

TobyRoseman commented 4 years ago

With more than two labels, this issue seems to replicate every time on Linux . This is not an issue on macOS with more than three labels.

However if you take a model trained on Linux then predict on macOS, all predictions are for only two labels. This seems to be an issue with the training of our TensorFlow implementation.

TobyRoseman commented 4 years ago

Two quick updates here: 1 - At least with 2k iterations, the predicted labels always seem to be one of two classes. Those classes are always the first two entries in the classes member list of the trained model. 2 - Setting tc.config.set_num_gpus(0) allows us to reproduce the issue on macOS 10.15. So this isn't at all an issue related to Linux. It's an issue with our TensorFlow implementation.

TobyRoseman commented 4 years ago

I've verified that this bug is present in turicreate 6.0 and that it was not present with 5.8 (the version before 6.0). Clearly this bug was introduced when migrating the MXNet implementation to TensorFlow.

Looking at the TensorFlow implementation and in particular the loss function, I'm not totally understanding it all but I'm not seeing anything which would cause this issue. I'll start comparing our current loss function to the previous implementation.

MaddyThakker commented 4 years ago

Is this issue fixed now after 85a8e44 commit?

TobyRoseman commented 4 years ago

Is this issue fixed now after 85a8e44 commit?

@MaddyThakker - this issue is fixed with that commit. However we've identified another issue with our TensorFlow implementation of object detection. The performance has significantly degraded since our 6.1 release. I'm actively investigating that issue and will do a point release once it is resolved.

MaddyThakker commented 4 years ago

Thanks for the update @TobyRoseman. Would building turicreate from source include this fix?

TobyRoseman commented 4 years ago

Thanks for the update @TobyRoseman. Would building turicreate from source include this fix?

That would fix the issue of only making predictions for two classes, but your accuracy will be quite poor.

TobyRoseman commented 4 years ago

This issues was fixed by #3160. We just released a point release (TuriCreate 6.2.2) with this fix. Upgrade your version of TuriCreate and you should be good to go.

apple / turicreate

Multiclass Object detection prediction always returns the same class #3031