airsplay / lxmert

PyTorch code for EMNLP 2019 paper "LXMERT: Learning Cross-Modality Encoder Representations from Transformers".
MIT License
933 stars 158 forks source link

Visual genome features #25

Closed JizeCao closed 5 years ago

JizeCao commented 5 years ago

Are the visual genome visual features extracted from the fine-tuned bottom-up model on the vg dataset or the pretrained one from bottom-up repo?

airsplay commented 5 years ago

Sorry... I did not fully understand your question. Could you specify it a little bit more?

Let me try to answer it first but I am not sure whether it is what you want. The Faster R-CNN in bottom-up repo is trained on Visual Genome (excluding COCO minival images). The features I used are extracted from this Faster R-CNN as described here.

JizeCao commented 5 years ago

Yeah, my mistake. I misunderstood that the feature extractor is trained on visual genome but nos MSCOCO...