doc-analysis / DocBank

DocBank: A Benchmark Dataset for Document Layout Analysis
Apache License 2.0
583 stars 72 forks source link

How to do inference with LayoutLM? #23

Open louisabraham opened 3 years ago

louisabraham commented 3 years ago

Hi, I'm trying to use run_seq_labeling.py from https://github.com/microsoft/unilm/tree/master/layoutlm on your data. However, the input format looks different.

Theirs uses _box.txt files that contain all samples.

I also noticed that run_seq_labeling.py adds a label "O" but your labels.txt files already has 13 classes like the pretrained models you provide, which makes me doubt that you used run_seq_labeling.py to train your model. Can you provide your training / evaluation code?

liminghao1630 commented 3 years ago

Our training code is also based on LayoutLM, so there is no plan to provide it.