Divergent-Discourses / TibNorm

Normalising Tibetan Text
1 stars 0 forks source link

re-write to search before and after Da log and Ta log character #16

Open ykyogoku opened 9 months ago

ykyogoku commented 9 months ago

re-write to search before and after Da log and Ta log character. only Da log and Ta log as final consonant should be replaced by -gs! Replacement will be done if both conditions apply: Condition 1: Da log or Ta log are followed either by tsheg, shad Condition 2: Da log or Ta log are preceded by any Tibetan consonant or stack If alone or in other position no replacement should take place, in particular if preceded by either a tsheg or whitespace, NO replacement!

ykyogoku commented 8 months ago

I wonder how to define པཊ་ཊི་ as an exception. In the current implementation, ཊ་ is replaced by གས་, except for the case where ཊ་ is preceded by tsheg (་), a tab (\t) or a line break (\n). But within this implementation, པཊ་ཊི་ would become པགས་ཊི་.

ykyogoku commented 8 months ago

The implementation is finished, but the problem I described above remains. If there are more exceptional cases other than པཊ་ཊི་, we should reconsider the conditions.