Multi-label classification

Hi @Fpooyan

In my opinion, the most straight-forward way to test what you have in mind with this code-base would be:

Convert the multi-label classification into a coco-style dataset. You would need one annotation for each class present in the image. For the box, just provide a dummy one, but make sure it is not empty and it fits within the image bounds
Disable the losses related to boxes (giou and L1), both in the loss and in the matcher (you can simply set their coefficients to 0)
You'll most likely have to write your own evaluation procedure, since detection AP will not make any sense in your case.

That being said, I have genuinely no idea how well this scheme will perform :p

Alternatively, I would recommend looking at this paper https://arxiv.org/abs/1904.05709, which does multi-label classification, including with transformers. It is interesting to note than in general they have found transformers to perform worse than alternatives (LSTM,...). Their prediction method is different than ours, since they predict the set auto-regressively (vs in parallel in DETR). I'd argue that this is not optimal for Transformers-based architectures, and may explain the relatively poor performance in their experiments.

facebookresearch / detr

Multi-label classification #350