Divergent-Discourses / TibNorm

Normalising Tibetan Text
1 stars 0 forks source link

add missing tsheg (་) #9

Closed fxerhard closed 9 months ago

fxerhard commented 9 months ago

Missing tshegs (་) are transcribed with whitespace ( ) or at the end of a line as line break (examples will be added later or on request)

ykyogoku commented 9 months ago

I found other cases where a white space occurs, so the replacement of a white space with a tsheg is not as simple as we thought:

As for Chinese characters, I would eliminate the spaces in between. For the other cases, I make exceptions, so that the spaces occurring before and after numbers, alphabetic characters and ༄ remain. What do you think?

fxerhard commented 9 months ago

This sounds like a good idea!

ykyogoku commented 9 months ago

For a technical reason, I keep spaces between Chinese characters.