Cleanup: Remove U+2063 (invisible separator) which occurs in Thai text cut & pasted from some text editors (like MS Word and iOS Notes) - to reduce duplicated texts
Validation: STRUCTURE_REGEX to prevent having “” ‘’ and ` in a long running text (of 55 or more characters)
Note: in the Sentence Extractor, the plan is to remove U+2063 as well (same as U+200b and U+200c zero-width chars)
STRUCTURE_REGEX
to prevent having “” ‘’ and ` in a long running text (of 55 or more characters)Note: in the Sentence Extractor, the plan is to remove U+2063 as well (same as U+200b and U+200c zero-width chars)