Category | Training Set | Validating Set | Testing Set |
---|---|---|---|
Num of Images | 20365 | 500 | 499 |
Percentage | 95% | 2.5% | 2.5% |
Training Set:
category | #instances | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|---|
chapter | 11312 | section | 17471 | clause | 106931 | total | 135714 |
Validating Set: | category | #instances | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|---|---|
chapter | 151 | section | 246 | clause | 3096 | total | 3493 |
Testing Set:
category | #instances | category | #instances | category | #instances | category | #instances |
---|---|---|---|---|---|---|---|
chapter | 151 | section | 249 | clause | 2947 | total | 3347 |
Images
Annotation
Beihang Pan:
Google Drive:
!pip install pyyaml==5.1
!pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
!git clone https://github.com/noba1anc3/Publaynet.git
cd Publaynet
After having the above dependencies and gcc & g++ ≥ 5, run:
!git clone https://github.com/facebookresearch/detectron2.git
cd detectron2
!python -m pip install -e .
cd ..
# Or if you are on macOS
# CC=clang CXX=clang++ python -m pip install -e .
from google.colab import drive
drive.mount('/content/drive/')
mkdir data
cp -rf ../drive/'My Drive'/train.zip ./data/
cp -rf ../drive/'My Drive'/val.zip ./data/
cd data
!unzip train.zip
!unzip val.zip
cd ..
!python train.py -f False
mkdir output
cp -rf ../drive/'My Drive'/model_final.pth ./output/
!python train.py -f True
chapter AP | section AP | clause AP | mAP |
---|---|---|---|
85.180 | 86.641 | 93.367 | 88.396 |
AP | AP50 | AP75 | APs | APm | APl |
---|---|---|---|---|---|
88.396 | 99.037 | 98.956 | NaN | 80.382 | 88.964 |
AR1 | AR10 | AR100 | ARs | ARm | ARl |
---|---|---|---|---|---|
57.0 | 91.4 | 92.0 | NaN | 84.8 | 92.1 |