Closed lvwerra closed 2 years ago
Adds a dedicated sentence tokenizer for vietnamese using underthesea.
underthesea
These are the affected files, all the other vi should not apply sentence splitting:
vi
lm_vi_wiktionary_filtered lm_vi_wikibooks_filtered lm_vi_wikiquote_filtered lm_vi_wikivoyage_filtered
Adds a dedicated sentence tokenizer for vietnamese using
underthesea
.