Closed tenzin3 closed 2 months ago
sentence tokenizing tibetan text and keeping only valid sentences.
if invalid token present: exclude the sentence if another lang present: exclude the sentence if a symbol present: filter out symbols, keep sentence
sentence tokenizing tibetan text and keeping only valid sentences.
if invalid token present: exclude the sentence if another lang present: exclude the sentence if a symbol present: filter out symbols, keep sentence