facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.43k stars 2.42k forks source link

Multi-label classification #350

Open Fpooyan opened 3 years ago

Fpooyan commented 3 years ago

Hi

I am currently working on a multi-label classification task, and I was wondering if I could test it with the idea of transformers, too.

@alcinos In an older issue, you had mentioned that this implementation can be used in multi label classification tasks, so I would appreciate it if you could give me a hint on how to adopt this repo for my own task.

Thanks in advance,

alcinos commented 3 years ago

Hi @Fpooyan

In my opinion, the most straight-forward way to test what you have in mind with this code-base would be:

  1. Convert the multi-label classification into a coco-style dataset. You would need one annotation for each class present in the image. For the box, just provide a dummy one, but make sure it is not empty and it fits within the image bounds
  2. Disable the losses related to boxes (giou and L1), both in the loss and in the matcher (you can simply set their coefficients to 0)
  3. You'll most likely have to write your own evaluation procedure, since detection AP will not make any sense in your case.

That being said, I have genuinely no idea how well this scheme will perform :p

Alternatively, I would recommend looking at this paper https://arxiv.org/abs/1904.05709, which does multi-label classification, including with transformers. It is interesting to note than in general they have found transformers to perform worse than alternatives (LSTM,...). Their prediction method is different than ours, since they predict the set auto-regressively (vs in parallel in DETR). I'd argue that this is not optimal for Transformers-based architectures, and may explain the relatively poor performance in their experiments.