Here is the code for ssbassline model. We also provide OCR results/features/models. The code is built on top of M4C, where more detailed information can also be found.
If you use ssbaseline in your work, please cite:
@article{zhu2020simple,
title={Simple is not Easy: A Simple Strong Baseline for TextVQA and TextCaps},
author={Zhu, Qi and Gao, Chenyu and Wang, Peng and Wu, Qi},
journal={arXiv preprint arXiv:2012.05153},
year={2020}
}
First install the repo using
git clone https://github.com/ZephyrZhuQi/ssbaseline.git ~/ssbaseline
cd ~/ssbaseline
python setup.py build develop
We provide SBD-Trans OCR for TextVQA and ST-VQA datasets. The corresponding OCR Faster R-CNN features and Recog-CNN features are also released.
Datasets | ImDBs | Object Faster R-CNN Features | OCR Faster R-CNN Features | OCR Recog-CNN Features |
---|---|---|---|---|
TextVQA | TextVQA ImDB | Open Images | TextVQA SBD-Trans OCRs | TextVQA SBD-Trans OCRs |
ST-VQA | ST-VQA ImDB | ST-VQA Objects | ST-VQA SBD-Trans OCRs | ST-VQA SBD-Trans OCRs |
We release the following pretrained models for ssbaseline on TextVQA.
For the TextVQA dataset, we release: ssbaseline trained with ST-VQA as additional data (our best model) with SBD-Trans.
Datasets | Config Files (under configs/vqa/ ) |
Pretrained Models | Metrics | Notes |
---|---|---|---|---|
TextVQA (m4c_textvqa ) |
m4c_textvqa/m4c_sbd.yml (need to modify: add data imdb and feature files of stvqa, see m4c_with_stvqa.yml for reference) |
ssbaseline_with_stvqa |
val accuracy - 45.53%; test accuracy - 45.66% | SBD-Trans OCRs; ST-VQA as additional data |
Please follow the M4C README for the training and evaluation of the M4C model on each dataset.
Question: Feature Extraction(文章中各部分feature提取的代码有开源吗,因为要用在一些别的数据上希望可以自己提取特征)
Answer: There are various features, and their corresponding repositories are shown below: (各部分feature提取的代码比较多,我把我用到的给你说一下:)