djiajunustc / TransVG

157 stars 26 forks source link

Input of Linguistic Branch #17

Open JJ-res101 opened 2 years ago

JJ-res101 commented 2 years ago

Thank you for your excellent work! How does the model get the box of a certain phrase in a sentence? Right now it seems to me that the model can't do that. Is that right?

djiajunustc commented 2 years ago

The box is not annotated to match a certain phrase, but the whole sentence.

JJ-res101 commented 2 years ago

I think the box is annotated to each phrase in Flickr30K Entities data. As said in your paper, "Flickr30K Entities [38] augments the original Flickr30K [58] with short region phrase correspondence annotations." Maybe the 'Flickr' dataset you use is one box annotation per sentence. Is that right?:)

jianghaojun commented 2 years ago

Just as you cited, "Flickr30K Entities [38] augments the original Flickr30K [58] with short region phrase correspondence annotations." which means the original sentences of Flickr30K are splited to short phrases and each phrase is annotated with a bbox. When training on Flickr30K Entities, each sample is consists of a phrase and a bbox.