doc-analysis / XFUND

XFUND: A Multilingual Form Understanding Benchmark
https://arxiv.org/abs/2104.08836
186 stars 19 forks source link

format of zh and ja #4

Open bakhbyergyen opened 2 years ago

bakhbyergyen commented 2 years ago

hi, I wanted to know that, why zh and ja datasets are split by character? not word by word? when building a dataset, sentences can be split by words, not characters? thank you. image