apple / turicreate

Turi Create simplifies the development of custom machine learning models.
BSD 3-Clause "New" or "Revised" License
11.19k stars 1.14k forks source link

Maximum number of classes for object detection #968

Closed sejersbol closed 6 years ago

sejersbol commented 6 years ago

Hi,

I was just wondering, is there a recommended maximum number of classes when doing object detection with Turi Create? (I have around 50) I understand that it is running YOLO (v2?) on Darknet, bun can't find any general recommendations.

Thanks in advance!

/Anders

sejersbol commented 6 years ago

Hi,

Just a comment. I have successfully trained an object detection model with 56 classes, i.e. there is no problem in training 50 - 60 classes. Many of my classes are very similar, but that does not become a problem. I used about 38.000 images with bounding boxes (after augmentation) and the training time took about 7 hours (on a EC2 p2.xlarge machine), but it all went well.

gustavla commented 6 years ago

Hi @sejersbol, sorry for the really long delay here!

Unfortunately I do not know where the breaking point is, and of course it will depend on acceptable evaluation metrics and training data size.

From a technical point of view there is no hard limit and if you go to extremes there could be Core ML model size issues and memory issues during inferences. However, that will only happen for extremely large number of classes.

From a modeling perspective (which is a problem that will happen much earlier than the technical limitation) it is not as clear. As you increase the number of classes, you increase the risk of making classification mistakes. Although, the severity of a lot of the mistakes should simultaneously go down as you will have more and more classes that are naturally similar (breeds of dogs, etc.). The original YOLO9000 paper (https://arxiv.org/pdf/1612.08242.pdf) trained a model using 9000+ classes with reasonable results (lots of mistakes of course, but still impressive). They trained it on a combination of detection and classification data, so if they actually had detection data for all 9000, then results would presumably be even better.

In your experiment it sounds like 50-60 was OK (thanks for giving us a sample point!). Anything below 100 is definitely tried and true, as long as you have the data. However, will 300 do OK? Will 1000 do OK? Theoretically I would say yes, if you are able to provide enough training data and you adjust your expectation of what a good evaluation metric is, since you know you'll make more mistakes. For instance, for classification with 1000 classes, it is common to report top-5 accuracy (that is, the correct label is in your top-5 classes for a sample).

sejersbol commented 6 years ago

Thanks @gustavla for the really informative answer :-)

I did not expect that there was a simple answer to my question. You have presented some very nice guidelines for what to expect and I will be looking into the YOLO9000 paper. However, the first rule of thumb must be, anything under 100 classes will work if you have enough data. In my own experiment I have found that 56 classes with approximately 1000 bounding boxes per class is going to work out fine. Maybe less bounding boxes will work, but around 1000 per class definitely works (used image augmentation to get from 250 to 1000 - mirroring the images and bounding boxes). Thanks!