What is the effect of an increasing number of classes?

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.76k stars 7.96k forks source link

What is the effect of an increasing number of classes? #5087

Open d3-worgan opened 4 years ago

d3-worgan commented 4 years ago

Hi,

Can anybody tell me if there is an upper limit to the number of classes that an object detector can learn? What are the effects on accuracy/effeciency as the number of classes increases?

For example, I have trained an object detector with 2 classes which works well. A model that can detect 80 classes like yolov3 performs well. But the open images pre-trained model from pjreddie hardly detects even a human. I found another open images model seems to perform better but still hardly detects anything unless it is close to the camera.

Is this a problem with the dataset? Am I using the models wrong? Or does it demonstrate an issue with detecting a large number of classes? I am struggling to find information on this if anyone can provide an explanation.

Thanks

AlexeyAB commented 4 years ago

OpenImages this is a badly marked dataset
The more classes you use - the lower accuarcy there will be (if you use the same: dataset, model and train the same number of iterations)
The larger the size of weights of the model - the greater the number of classes it can remember
The deeper the network - the higher initial network resolution you can use - the higher accuracy you will get

d3-worgan commented 4 years ago

Thanks, that seems to make sense. But can I ask these as well?

Does the accuracy decrease because the difficulty increases for seperating and classifying the features - there is increased confusion between similar classes?
What do you mean resolution? Do you mean a deeper network with more nodes would be needed to handle the increased classes?

Thanks

AlexeyAB commented 4 years ago

Yes, network should remember more features
The deeper network (more layers) -> the higher receptieve field of each final activation -> the higher network resolution you can use to increase accuracy -> it allows you to see more details to distinguish different classes. Network resolution is: https://github.com/AlexeyAB/darknet/blob/5c2ddd301e51d60a700797d58bb2e8e686b4bd47/cfg/csresnext50-panet-spp-original-optimal.cfg#L8-L9