Closed YassineElbouchaibi closed 4 years ago
79 training images is a very small amount of data for trying to train an object detector. This particularly true if you have 38 classes. You're probably going to need a lot more training data in order to get an effective model. This is likely the reason for the issues you are having.
During training, what happens to the loss? Is it generally decreasing? Is it decreasing for awhile and then going back up?
@TobyRoseman I am training with only 79 images because i'm kind of debugging... I already trained with 1500 images, ~4500 bbox, 59 classes for about 12 hours and the same thing was happening so I want to save time debugging.
Regarding the loss, it starts at ~14 and ends at ~1 so generally decreasing from start to end
Is any character forbidden from being used as a label ? Is it ok if my bbox x, y are floats with 1 decimal (e.g 45.5)
@TobyRoseman I am training with only 79 images because i'm kind of debugging... I already trained with 1500 images, ~4500 bbox, 59 classes for about 12 hours and the same thing was happening so I want to save time debugging.
Ok, that sounds good.
Regarding the loss, it starts at ~14 and ends at ~1 so generally decreasing from start to end
That sound reasonable to me.
Is any character forbidden from being used as a label ?
There are no forbidden characters for the labels.
Is it ok if my bbox x, y are floats with 1 decimal (e.g 45.5)
Floats should be fine.
A few questions:
1 - What operating system are you using to train your model?
2 - What is your label distribution like? Are there a lot more bounding boxes for one label? Is this the label that is always getting predicted?
3 - Try passing in max_iterations=1
into create
. Does that model always predict the same label?
Thanks for the support by the way!
1 - What operating system are you using to train your model?
I work on Google Colab and here is the os info :
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
2 - What is your label distribution like? Are there a lot more bounding boxes for one label? Is this the label that is always getting predicted?
As you can see from the graph below, the label distribution is not that great, however the one with the most bboxes is not the one that is always predicted. It seems to always be the one at index 1 of my class list (from model.classes).
3 - Try passing in max_iterations=1 into create. Does that model always predict the same label?
So max_iterations=1 was too low and the loss was too high which was leading to no predictions. I ran it a couple of times with this max_iterations=100 instead :
model = tc.object_detector.create(train_data, batch_size=16, max_iterations=100)
And the hypothesis mentionned above seems to be indeed whats happening. The predicted labels seem to always be the one at index 1 of my class list (from model.classes). In this case I was always getting 'Aluminium blister pack' and model.classes was :
['Aerosol',
'Aluminium blister pack',
'Aluminium foil',
'Battery',
'Broken glass',
'Carded blister pack',
'Cigarette',
'Clear plastic bottle',
'Corrugated carton',
'Crisp packet',
'Disposable food container',
'Disposable plastic cup',
'Drink can',
'Drink carton',
'Egg carton',
'Foam cup',
'Foam food container',
'Food Can',
'Food waste',
'Garbage bag',
'Glass bottle',
'Glass cup',
'Glass jar',
'Magazine paper',
'Meal carton',
'Metal bottle cap',
'Metal lid',
'Normal paper',
'Other carton',
'Other plastic',
'Other plastic bottle',
'Other plastic container',
'Other plastic cup',
'Other plastic wrapper',
'Paper bag',
'Paper cup',
'Paper straw',
'Pizza box',
'Plastic bottle cap',
'Plastic film',
'Plastic glooves',
'Plastic lid',
'Plastic straw',
'Plastic utensils',
'Polypropylene bag',
'Pop tab',
'Rope and strings',
'Scrap metal',
'Shoe',
'Single-use carrier bag',
'Six pack rings',
'Spread tub',
'Squeezable tube',
'Styrofoam piece',
'Tissues',
'Toilet tube',
'Tupperware',
'Unlabeled litter',
'Wrapping paper']
@TobyRoseman So I ran some tests, I limited my self to 10 images and their labels, and trained 3 models, one on Turicreate 5.8, one on 6.0 and one on 6.1 and this behavior seems to happen only on 6.X. However, on 5.8, if the iteration number is too high, the training freezes after a while and nothing gets printed in the output so i need to kill the process and lower max_iterations.
So I just ran a 3 hours training (batch_size=16, max_iterations=22000) with the full dataset, turicreate v5.8, tensorflow-gpu v2.0, same OS as before (Ubuntu 18.04.3 LTS (Bionic Beaver)) and the model works as intented (different classes are correctly predicted). Here are some examples of prediction :
So this is definitely the workaround l'll be using. Also, this may suggest tc v6.0 or one of its new dependencies has a breaking change.
@YassineElbouchaibi - thanks for the update. I'm glad 5.8 is working for you.
TuriCreate 6.0 was a major update. We moved the Object Detector (and all of our other deep learning toolkits) from using MXNet to TensorFlow.
I can't reproduce this issue. I used a two class dataset and trained an Object detector models on Ubuntu. Predictions from this model contain both classes.
I strongly think this issue only happens when you have 3 classes or more... I’ll try testing tc v6.1 with a 3 class dataset I found and also with a 2 class dataset and see if my hypothesis is true. If it’s the case I’ll leave a link to an interactive python notebook for the reproduction.
@YassineElbouchaibi - My apologies for not responding sooner. I finally trained an object detection model, on Linux, with a dataset containing more than two classes. I agree there is a serious issue here.
On Linux, I trained an object detection model using a dataset with 6 classes, 24 image and 103 bounding boxes. After 2,000 iterations, I examined predictions on the train set. It produces roughly the right number of bounding box predictions, but the labels for all predictions are always only one of two classes, with most being from just one class. I get similar results with validation data.
On macOS, the same code and same dataset produce reasonable looking results (i.e. class predictions have roughly the right distribution).
We'll investigate this issue further.
With more than two labels, this issue seems to replicate every time on Linux . This is not an issue on macOS with more than three labels.
However if you take a model trained on Linux then predict on macOS, all predictions are for only two labels. This seems to be an issue with the training of our TensorFlow implementation.
Two quick updates here:
1 - At least with 2k iterations, the predicted labels always seem to be one of two classes. Those classes are always the first two entries in the classes
member list of the trained model.
2 - Setting tc.config.set_num_gpus(0)
allows us to reproduce the issue on macOS 10.15. So this isn't at all an issue related to Linux. It's an issue with our TensorFlow implementation.
I've verified that this bug is present in turicreate 6.0 and that it was not present with 5.8 (the version before 6.0). Clearly this bug was introduced when migrating the MXNet implementation to TensorFlow.
Looking at the TensorFlow implementation and in particular the loss function, I'm not totally understanding it all but I'm not seeing anything which would cause this issue. I'll start comparing our current loss function to the previous implementation.
Is this issue fixed now after 85a8e44 commit?
Is this issue fixed now after 85a8e44 commit?
@MaddyThakker - this issue is fixed with that commit. However we've identified another issue with our TensorFlow implementation of object detection. The performance has significantly degraded since our 6.1 release. I'm actively investigating that issue and will do a point release once it is resolved.
Thanks for the update @TobyRoseman. Would building turicreate from source include this fix?
Thanks for the update @TobyRoseman. Would building turicreate from source include this fix?
That would fix the issue of only making predictions for two classes, but your accuracy will be quite poor.
This issues was fixed by #3160. We just released a point release (TuriCreate 6.2.2) with this fix. Upgrade your version of TuriCreate and you should be good to go.
Multiclass Object detection prediction always returns the same class but different after each training.. Let me clarify.
So I have a dataset and here is its head :
Here is one exemple of annotations :
The corresponding image (labeled) is this one :
But after training an Object detection model here is what I get :
In fact, Broken glass is the only label I get : ...
So I verified model.classes and this is what I got : (the right thing)
I also verified model.summary : (Everything seems legit)
I would normally train for longer but limited the training because this problem happens even if I train for around 6-7 hours.
I work on google colab and here is a bit of my code :
Here is the output I get before all the iterations :
In conclusion everything seems normal but I only get one class for all my predictions afterwards.
Thanks guys!