Vocabulary and single image-question pair prediction

ChenRocks / UNITER

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"

https://arxiv.org/abs/1909.11740

777 stars 109 forks source link

Open foxm79 opened 3 years ago

foxm79 commented 3 years ago

Is the vocabulary available that takes the words of the questions and converts them to 'input_ids'?
Is there a function that does this for an input question?
Is there a code that take a single image-question pair and predicts the answer?

tjulyz commented 3 years ago

foxm79 commented 3 years ago

Yes, that is what I followed eventually. Thanks for replying !