FormulaNet is a new large-scale Mathematical Formula Detection dataset. It consists of 46'672 pages of STEM documents from arXiv and has 13 types of labels. The dataset is split into a train set of 44'338 pages and a validation set of 2'334 pages. Due to copyrights reasons, we can only provide the list of papers, which must be downloaded and processed.
Prerequisites
git clone https://github.com/felix-schmitt/FormulaNet.git
The file structure should look like this:
.
├── ...
├── Dataset
│ ├── train
│ │ ├── img # empty folder
│ │ └── train_coco.json
│ └── test
│ ├── img # empty folder
│ └── test_coco.json
└── ...
build dockerfile (amd64 and arm64 supported)
docker build -t formulanet --build-arg Platform='amd64' .
run the container with mounting the FormulaNet Folder
docker run -v ~/<path to FormulaNet folder>/Dataset:/FormulaNet/Dataset formulanet
Prerequisites
git clone https://github.com/felix-schmitt/FormulaNet.git
The file structure should look like this:
.
├── ...
├── Dataset
│ ├── train
│ │ ├── img # empty folder
│ │ └── train_coco.json
│ └── test
│ ├── img # empty folder
│ └── test_coco.json
└── ...
Install the python environment (recommended Python 3.8)
pip install -r requirements.txt
run the script
python download.py
Model | mAP | mAP@50 | mAP@75 | mAP@inline | mAP@display |
---|---|---|---|---|---|
FCOS-50 | 0.754±0.03 | 0.921±0.02 | 0.84±0.02 | 0.752±0.02 | 0.755±0.02 |
FCOS-101 | 0.755±0.03 | 0.920±0.02 | 0.841±0.02 | 0.756±0.02 | 0.749±0.03 |
The results can be reproduced by using these config files (FCOS-50, FCOS-101) and the github repo Yuxiang1995/ICDAR2021_MFD.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Felix M. Schmitt-Koopmann, Elaine M. Huang, Hans-Peter Hutter, Thilo Stadelmann, Alireza Darvishy
https://ieeexplore.ieee.org/document/9869643
@ARTICLE{9869643,
author={Schmitt-Koopmann, Felix M. and Huang, Elaine M. and Hutter, Hans-Peter and
Stadelmann, Thilo and Darvishy, Alireza},
journal={IEEE Access},
title={FormulaNet: A Benchmark Dataset for Mathematical Formula Detection},
year={2022},
volume={10},
number={},
pages={91588-91596},
doi={10.1109/ACCESS.2022.3202639}}