ChenRocks / UNITER

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"
https://arxiv.org/abs/1909.11740
777 stars 109 forks source link

Any faster feature extraction tools (better than the one in bottom-up-detection and lxmert)? #60

Open yezhengli-Mr9 opened 3 years ago

yezhengli-Mr9 commented 3 years ago

Hi @ChenRocks @linjieli222, It is still ok even if it is not used by VisualBERT, LXMERT, and UNITER.

For example, by comparison with one GPU (cpu-only is presumably not tolerable), for NLVR2 107,292 images, lxmert takes 5-6 hours to extract faster-rcnn features by this caffe. I also follow visualBERT's issue#1 and issue#10, LXMERT, transformers-VQA.

yezhengli-Mr9 commented 3 years ago

I think Detectron is one of the solutions -- used by VisualBERT and mentioned in issue 48 of LXMERT.

Any better suggestions, just let me know~

yezhengli-Mr9 commented 3 years ago

@YIKUAN8 provides me insights in transformers-VQA issue#5 "Comparison of speed of extracting features". If you have your point of view, just let me know.