Esukhia / bophono

Tibetan phonetics engine in Python
MIT License
16 stars 5 forks source link

updating KVP rules #22

Open eroux opened 2 months ago

eroux commented 2 months ago

The rules for KVP currently implemented are outdated and need to be updated with the following document:

KVP-phon.pdf

See also these examples that were reported (some have been fixed but not all)

KVP_Phonetics_tool_corrections.csv

eroux commented 2 months ago

Answer from KVP on some ambiguities:

1) dba goes to wa (like dbang mo > wang mo) but dbu goes to ü (like dbu chen > u chen or dbus gtsang > ü tsang) and dby > y (like dbyangs goes to yang)

2) The guidelines should be more clear that we effectively have two phonetics systems: a) used in whole lines of verse, split out into separate syllables, that includes ü , and b) used in the body of the text or parentheses, with conjoined syllables, that does not include ü. The latter is something that translators should be able to do on their own, here and there, relatively rarely. The former is something that editors need to be able to generate mass quantities of, and that's really what this tool is designed for. As far as I can remember, the ü umlaut and the syllables (paired or not) are the only differences, so all other rules in the guidelines should apply. I noted that the tool already generates an umlauted ü (see khrus), so it's really only the ül that is needed.

eroux commented 2 months ago

For the first point, let's add w for dba dbö dbo

(just for reference, Manual of Standard Tibetan p. 444)

eroux commented 1 month ago

Ones last rule clarification:

དྲ goes to 'dra', like in Dorje Drak (Wyl. rdo rje drag, KVP dor je drak) དྲུ follows the same rule, like in 'six' (Wyl. drug, KVP druk)

Our rule of གྲ going to 'tra' at the beginning of a word will be shifted in our guidelines. It should be 'dra' also. As you know, some verbs are interchangeably spelled with 'g' or 'd' (like Wyl. gras / dras), so consistent voicing is the best.

Finally, བྲ should also go to 'dra', since it is voiced.