facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.43k stars 2.42k forks source link

How does the model expect no object class? #411

Open amirhesamyazdi opened 3 years ago

amirhesamyazdi commented 3 years ago

I want to use DETR to detect Airplanes in the sky (so there is either airplanes or no airplanes, a binary object detection task) I dont have any other classes. I know how to do this in other models but very confused about these DETR-specific concerns:

1 ) When preparing the test and train sets, do I need to provide images that don't have any airplanes in them as non-object samples (negative samples)? Or does DETR automatically consider the parts of the images that are outside the bounding boxes as negative examples in training set? This is very confusing for me. Do I need to explicitly include images with no bounding boxes associated to them in the training and testing set (negative samples - no airplane) ?

Basically my question is how the non-object class is handled in data preparation and training process? especially in binary object detection? In typical Deep learning you would provide a balanced training and testing set with equal-ish amount for 0 and 1 class, but I don't know how this is handled here.

2) Plus, in my entire dataset, the most number of planes in any given image sample is 2. So at most I expect any real-world test set to have 2 airplanes in the sky. And I do not want my model to detect other objects (just airplane). In such case how would you set number of queries? Do you set it to 2? These are my most important questions. Does setting it very low (based on maximum objects that you expect) affect the model's ability to detect airplanes at different locations in test images? for example extreme ends, periphery or something like that?

ghy0324 commented 3 years ago

Hi, I met the similar problem. Have you solved it?

amirhesamyazdi commented 3 years ago

Not so much as to "solve" my questions (Especially my second), but for the first question, the bipartite matching takes care of the learning of non-class object, and we don't need to provide any non-object images. You just have to make sure you are leaving a few categories empty for Non objects and reflect that on the total number of classes in models/detr build() function. Note that number of classes must be 1 more than total number of objects. and category number 0 is reserved for non-object so start categoryid from 1. and depending on number of classes, you might need to skip some category numbers. For example the COCO dataset itself only has 72 objects, but they define their number of classes as 92. They skip over some of the categoryids.

Regarding my second question, I was hoping for a proper insightful response from one of the contributors, but nothing came by so after trying the model and getting OK results and reading the paper one more time I came to the above conclusion. And for number of queries, I set it very low, like only 3 or 4. Since I may have at most 2 objects. But increasing it I notice can to some extent improve your accuracy but it might be over-fitting and this is something I need to hear from the contributors too.

GeJintian commented 3 years ago

I think you could refer to https://huggingface.co/transformers/model_doc/detr.html. It says:

Note that it’s good to have some slack (in COCO, the authors used 100, while the maximum number of objects in a COCO image is ~70).

But meanwhile, I am confused about how the author trains the non-object? Does he just ignore those parts which corresponds to non-object? Could you please explain it more clearly