Caucasus-Rosetta / Lingua-Corpus

Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)
Apache License 2.0
33 stars 6 forks source link

[Common Voice] Split text into smaller pieces to digest for editing and translating #88

Closed danielinux7 closed 2 years ago

danielinux7 commented 3 years ago

Ахцәажәара

Smaller pieces to digest for editing and translating are needed. Ауадаҩрақәа

Sentences need clean up, and file splitting.

Аӡбарақәа

Using shell scripts to do file splitting, cleaning