Closed LinguList closed 7 years ago
I am not sure which strategy is the best, but an alias th > t seems to be quite helpful.
Also, for things like p' there is a nice phonetic cover term for things that are "ejective", "implosives" and "creaky/laryngealized", which is to treat them as "glottalic" sounds, i.e a phonation type and airstream mechanism essentially different from modal voice, voiceless or aspirated.
This is now solved by following the following pragmatic approach:
We have huge problems with unicode lookalikes (especially implosive things, written with '-like graphemes). What is the best workflow here, either:
Aliases have the advantage to allow us to define "th" as a valid sound representation that is modified to t+superscript_h (although the latest PNY orthoprofile revealed that this is not correct for Australian languages, where th means "dental t" opposed to "alveolar t", or something similar). They offer great flexibility also allowing us to include our experience in how people write stuff, but they may at time simply fail.
A further disadvantage is, if we use them all over the place, is that they will drastically increase the number of segments present in BIPA.
Multicode has the disadvantage that the workflow involves an additional step: instead of just adding a new alias, if I encounter a new bad lookalike, I'll have to check explicitly for the unicode we are using in the aliases (which will still be needed, as they handle non-lookalike practices like the "th" problem).
But maybe, that's even the most consistent way to go: reduce the string by normalizing it with multicode, and only define the aliases for cases where (a) ordering is involved, or (b) graphemes are misused, like in "th".