Did you use the test protocol of original val with 3,817, 6,346 and 5,373 images or the protocol of the paper Image Search With
Text Feedback by Visiolinguistic Attention Learning which is the union of the reference and target images with
redundancy removed? Such two protocols may produce large difference of recall score
❓ Questions and Help
Did you use the test protocol of original val with 3,817, 6,346 and 5,373 images or the protocol of the paper Image Search With Text Feedback by Visiolinguistic Attention Learning which is the union of the reference and target images with redundancy removed? Such two protocols may produce large difference of recall score