Error in proposed dataset!

zetaodu commented 11 months ago

ov_phrase_loc_val_base.json and ov_phrase_loc_val_novel.json have wrong "tokens_positive_eval", most of them are [0:5] and [14:25]

zetaodu commented 11 months ago

The Annotations phrase_id also mismatch the Sentences phrase_id!

cv516Buaa commented 11 months ago

The Annotations phrase_id also mismatch the Sentences phrase_id!

Thank you very much for paying attention to our algorithm. The question is valuable, Our OV-PL dataset is different from flickr30k Entities. This part is the setting when we build the dataset and it does not affect the final results. By the way, have you tried running the provided OV-PL dataset under GLIP or others according to the settings in the article? As far as we know, this problems you asked does not affect the testing results of the dataset under the PL framework.

cv516Buaa commented 11 months ago

I have tried it on GLIP, and the code reported a bug

Can you provide a screenshot of the bug?

zetaodu commented 11 months ago

I have tried it on GLIP, and the code reported a bug

Can you provide a screenshot of the bug?

I appreciate your instant reply. I borrow the eval code from flickr_eval, 1698822165367

cv516Buaa commented 11 months ago

I have tried it on GLIP, and the code reported a bug

Can you provide a screenshot of the bug?

I appreciate your instant reply. I borrow the eval code from flickr_eval,

You should modify this part of the code, because in flickr30k dataset, each image has five sentences, which are used simultaneously and calculate the final unified results. However, in our OV-PL dataset, there are only two sentences in each image, one is the description of base class, and the other is the base+novel class, and the two sentences are not calculated at the same time. The results of the base class only use description of the base sentence. The result of novel class only runs the description of base+novel sentence.

zetaodu commented 11 months ago

I have tried it on GLIP, and the code reported a bug

Can you provide a screenshot of the bug?

I appreciate your instant reply. I borrow the eval code from flickr_eval,

You should modify this part of the code, because in flickr30k dataset, each image has five sentences, which are used simultaneously and calculate the final unified results. However, in our OV-PL dataset, there are only two sentences in each image, one is the description of base class, and the other is the base+novel class, and the two sentences are not calculated at the same time. The results of the base class only use description of the base sentence. The result of novel class only runs the description of base+novel sentence.

the number of sentences is not the point, the main problem is some noun phrases marked by tokens_positive_eval did not exist in corresponding annotation. Sorry to bother you, could you share the verification code on GLIP? I'm so appreciate about that. Cause I get R@1=0.77 R@5=0.89 R@10=0.90 after filter the OVPL.

zetaodu commented 11 months ago

I looked at the dataset very closely, in the first sample, I can only find one bounding box which is "a television"(1), and the "A cabinet"(0) get wrong box(should be "a woman"), noun phrase "room" has no box, and boxes named (3) have no phrase corresponding to. Can you solve my doubts, thank you very much.

JierunChen commented 10 months ago

The Annotations phrase_id also mismatch the Sentences phrase_id!

Thank you very much for paying attention to our algorithm. The question is valuable, Our OV-PL dataset is different from flickr30k Entities. This part is the setting when we build the dataset and it does not affect the final results. By the way, have you tried running the provided OV-PL dataset under GLIP or others according to the settings in the article? As far as we know, this problems you asked does not affect the testing results of the dataset under the PL framework.

Dear authors, the data in ov_phrase_loc_val_base.json and ov_phrase_loc_val_novel.json seems to be wrong. Could you please provide the correct version or provide a detailed README.MD on how to correctly load the annotations? Thanks.

JierunChen commented 10 months ago

I looked at the dataset very closely, in the first sample, I can only find one bounding box which is "a television"(1), and the "A cabinet"(0) get wrong box(should be "a woman"), noun phrase "room" has no box, and boxes named (3) have no phrase corresponding to. Can you solve my doubts, thank you very much.

Hi, did you resolve the issue?

zetaodu commented 10 months ago

I looked at the dataset very closely, in the first sample, I can only find one bounding box which is "a television"(1), and the "A cabinet"(0) get wrong box(should be "a woman"), noun phrase "room" has no box, and boxes named (3) have no phrase corresponding to. Can you solve my doubts, thank you very much.

Hi, did you resolve the issue?

No, the issue still exists. I filtered them out, but I got fewer samples.

cv516Buaa / OV-VG

Error in proposed dataset! #2