allenai / vila

Incorporating VIsual LAyout Structures for Scientific Text Classification
Apache License 2.0
167 stars 17 forks source link

Find all the unicode chars with zero encoding len #30

Open lolipopshock opened 2 years ago

lolipopshock commented 2 years ago

See https://github.com/huggingface/tokenizers/issues/1077