ModelTC / United-Perception

United Perception
Apache License 2.0
428 stars 65 forks source link

Is sigmoid classifier suitable for multi-classification(num of categories > 1000 in LVIS) problems? #26

Closed Icecream-blue-sky closed 2 years ago

Icecream-blue-sky commented 2 years ago

Since you are based on the sigmoid classifier, I am curious if your detection results on LVIS will have many false positives in the same location but with different categories. The reason why I ask this is that, I used to train one-stage detector on datasets similar with LVIS (which is long tailed logo dataset, with 352 categories), however, I get many FP with different categories at the same location. I'm wondering if you have encountered the same situation. Thanks! alfaromeo5 And I think it may be due to the use of sigmoid classifier which consists of multiple independent binary classifiers. It may be not suitable for multi-classification(num of categories > 1000). Of course this is just my conjecture, any advice is welcome...

waveboo commented 2 years ago

Hi @Icecream-blue-sky , thanks for your good question~ Actually, the false positive problem is not specific to the sigmoid loss. Since the NMS in the modern detection framework is class-related, objects of different categories at the same location will not suppress each other. This problem is also in the softmax loss, although there is strong competition between different categories in the loss design. What's more, there are indeed many categories that are semantically similar. like mirrors and rear mirrors, dresses and costumes... I presume your task has a similar problem, because different logos may do not look much different. To conquer this problem, on the one hand, you could apply the class-agnostic NMS to your detection result during the inference phase to filter out some low confidence boxes. And on the other hand, adding a larger penalty term for false positives during training is also a good option.

Icecream-blue-sky commented 2 years ago

Thanks for your kind advice! Now I think part of the reason is the imbalance of training data(long tailed) in each category, which makes it difficult for rare classifiers to classify accurately. Maybe adding EFL loss would help, what do you think?

waveboo commented 2 years ago

In the long-tailed case, the model is biased towards the frequent categories that may cause the FP problem. EFL pushes the model to focus more on the rare categories and hard instances. Thus the correct (rare) category of the object is more likely to be predicted with high confidence. So feel free to have a try on EFL. However, objects of different categories that appear at the same location may not be changed too much because the detection framework does not have an argmax mechanism like classification. So after using EFL, the class-agnostic NMS or increasing the threshold of the display boxes confidence is still needed.

Icecream-blue-sky commented 2 years ago

In the long-tailed case, the model is biased towards the frequent categories that may cause the FP problem. EFL pushes the model to focus more on the rare categories and hard instances. Thus the correct (rare) category of the object is more likely to be predicted with high confidence. So feel free to have a try on EFL. However, objects of different categories that appear at the same location may not be changed too much because the detection framework does not have an argmax mechanism like classification. So after using EFL, the class-agnostic NMS or increasing the threshold of the display boxes confidence is still needed.

If

In the long-tailed case, the model is biased towards the frequent categories that may cause the FP problem. EFL pushes the model to focus more on the rare categories and hard instances. Thus the correct (rare) category of the object is more likely to be predicted with high confidence. So feel free to have a try on EFL. However, objects of different categories that appear at the same location may not be changed too much because the detection framework does not have an argmax mechanism like classification. So after using EFL, the class-agnostic NMS or increasing the threshold of the display boxes confidence is still needed.

If the model can classify the rare classes better, than the max score class of each predicted bboxes will be gt class. At that time, the class-agnostic NMS or increasing the threshold of the display boxes confidence will be better. Thanks!