OpenPecha / Tib-word-gathering

MIT License
0 stars 0 forks source link

MT0031: Validate Word Segmentation Using the Unique Word List. #6

Open jim-gyas opened 1 week ago

jim-gyas commented 1 week ago

Description:

Create a series of scripts to validate word segmentation by ensuring that each word in the target field:

The process involves checking for both oversegmentation and undersegmentation issues, followed by merging the validated data to produce a final dataset that is free of segmentation errors.

Completion Criteria:

Implementation :

Screenshot 2024-10-08 at 9 38 59 AM

Generated Data Entries Info:

Subtasks:

kaldan007 commented 1 week ago

སྐུ་ ཞབས་ ལགས །

a c b c [a, b, ab ,c] ab c a b c ང་ཚོ་ མི་ཚེ འི་ ང་ཚོ་ ལ ས་ ཆུང་ཆུང་ཞིག་ གི་ མཚོན་བྱེད