format of zh and ja - Githubissues

doc-analysis / XFUND

XFUND: A Multilingual Form Understanding Benchmark

186 stars 19 forks source link

format of zh and ja #4

Open bakhbyergyen opened 2 years ago

bakhbyergyen commented 2 years ago

hi, I wanted to know that, why zh and ja datasets are split by character? not word by word? when building a dataset, sentences can be split by words, not characters? thank you.