ashkamath / mdetr

Apache License 2.0
969 stars 125 forks source link

How to generate "tokens_negative" and "tokens_positive" when we convert our own dataset into mdetr annotations? #89

Closed QiuHeqian closed 1 year ago

QiuHeqian commented 1 year ago

Hi, thanks for the open-source code and annotations.

I am confused about how to generate "tokens_negative" and "tokens_positive" in the annotation. For example,

in 'images': {'file_name': 'COCO_train2014_000000580957.jpg', 'height': 428, 'width': 640, 'id': 120624, 'original_id': 580957, 'caption': 'bowl behind the others can only see part', 'dataset_name': 'refcoco', 'tokens_negative': [[0, 4], [5, 11], [23, 26], [27, 31], [32, 35], [36, 40]]}

I couldn't understand the meaning of "[[0, 4], [5, 11], [23, 26], [27, 31], [32, 35], [36, 40]]".

in 'annotations': {'area': 17770.195949999998, 'iscrowd': 0, 'image_id': 120624, 'category_id': 51, 'id': 120624, 'bbox': [468.3, 0.91, 171.7, 116.12], 'original_id': 1537681, 'tokens_positive': [[36, 40]]}

I couldn't understand the meaning of "[[36, 40]]".

I will be very grateful if you could help me to understand!

ashkamath commented 1 year ago

Hey! You don't need the tokens negative, it was an ablation we ran during prototyping. For tokens positive, it is the span of characters in the full caption that is aligned to a particular object. For example if the sentence is "the cat on the table" and you have a box for the cat and for the table, the annotations will look as follows :

for box1 : tokens_positive: [[4,7]] and for box2: [[16, 21]]

Note that it can also have 2 or more spans aligned for a given box, for example in the sentence "the cat on the table is scratching the table". In this case, the box2 would have tokens_positive as follows: [[16,21], [40,45]].

In referring expression, we do not have dense grounding for all the noun phrases in the sentence (the task is to provide one box for the whole sentence). To make it more similar to the rest of the grounding data in our dataset we choose to align the root of the sentence (extracted using spacy) as the word to align the box with. You can choose other ways to do this.

Hope this helps! :)