Closed cormacanderson closed 3 years ago
This is fine, will however cause a LOT of trouble, as many people use the damn L in our data, so this HAS side effects. I'd be inclined to leave the normalization, since if something has been normalized, this can always be retrieved from the code, and we even through warnings, if this happens now.
But an alias is a good idea: this means, you will retain the original character, and can from there check that it is different. Is that a good compromise?
I get that and it's annoying. But if people sometimes use ł to mean ɫ and sometimes to mean ɬ I don't see how either normalisation or an alias is likely to work. Is this not rather something that should be dealt with by an orthography profile?
Leaving it in as a normalisation seems very risky to me. ɫ and ɬ are quite different beasts...
so make an alias, as I said.
If you use orthography profiles by now, they WILL point you to the use of aliases.
Also to normalizations, so you can trace this as a researcher.
E.g., this is the output we now (review pending) receive on a profile check in CLDF:
Grapheme | BIPA | Modified | Segments | Graphemes | Count |
---|---|---|---|---|---|
eu | eu | k i 5/⁵ + f eu 1/¹ | ki5_feu1 | 1 | |
ɯɘ | ɯɘ | ts ɯɘ ŋ *5/⁵ | tsɯɘŋ5 | 1 | |
ɛi | ɛi | n ɛi *3/³ | nɛi3 | 2 | |
oːi | oːi | ɬ oːi *1/¹ | ɬo:i1 | 3 | |
oi | oi | f oi l *1/¹ | foil1 | 3 | |
iu | iu | kʰ iu *1/¹ | khiu1 | 5 | |
ɔi | ɔi | tθ ɔi *1/¹ | tθɔi1 | 6 | |
iːu | iːu | r iːu *1/¹ | ri:u1 | 7 | |
uːi | uːi | b uːi 3/³ + b uːi 3/³ | bu:i3_bu:i3 | 10 | |
ui | ui | x ui *3/³ | xui3 | 11 | |
aːi | aːi | b aːi 3/³ + b aːi 3/³ | ba:i3_ba:i3 | 25 | |
ou | ou | h ou *1/¹ | hou1 | 29 | |
ai | ai | ʔ ai 1/¹ + l a 3/³ | Ɂai1_la3 | 37 | |
ei | ei | n ei *2/² | nei2 | 43 | |
eɯ | eɯ | m eɯ *1/¹ | meɯ1 | 45 | |
au | au | f au *1/¹ | fau1 | 59 | |
aːu | aːu | t aːu *3/³ | ta:u3 | 63 |
Grapheme | BIPA | Segments | Graphemes | Count |
---|---|---|---|---|
lh/l̥ | l̥ | d u ŋ 2/² + lh/l̥ a n 4/⁴ | duŋ2_lhan4 | 1 |
*9/⁵⁴ | ⁵⁴ | t i k *9/⁵⁴ | tik9 | 4 |
ɘ/ə | ə | k ɘ/ə *5/⁵ | kɘ5 | 5 |
i/j | j | m ɔ *5/⁵ + m i/j ɘ/ə | mɔ5_miɘ | 12 |
u/w | w | ɬ u/w ai *1/¹ | ɬuai1 | 12 |
*6/⁵¹ | ⁵¹ | ʔ a 1/¹ + r a 6/⁵¹ | Ɂa1_ra6 | 33 |
*8/⁵³ | ⁵³ | m eɯ ʔ *8/⁵³ | meɯɁ8 | 34 |
*5/⁵ | ⁵ | k ɘ/ə *5/⁵ | kɘ5 | 54 |
*4/⁴ | ⁴ | m eɯ *4/⁴ | meɯ4 | 83 |
*2/² | ² | ʔ a 2/² + r ou 1/¹ | Ɂa2_rou1 | 85 |
*7/⁵² | ⁵² | h ou ʔ *7/⁵² | houɁ7 | 96 |
*3/³ | ³ | d e *3/³ | de3 | 203 |
*1/¹ | ¹ | h ou *1/¹ | hou1 | 432 |
Okay, can you clarify for me though what I exactly you are saying I should add as an alias? You mean ł as an alias of lˠ, right, alongside ɫ? Or do you mean I should rather add it as an alias of ɬ?
I thought normalisation was for lookalikes, like : for ː etc. not for cases like this...
sorry, I never saw your reply, @cormacanderson. I'll have to look into this tomorrow!
So for me, the ł is a lookalike. It is less clear what KIND of lookalike it is, but you find it frequently in the literature, for both cases you mention. It may indeed be useful to say: let us drop our normalization, as it only normalizes arbitrarily in one direction, and let us force people to be more exact.
So yes, @cormacanderson, just merge the PR if you agree still with the decision. For an update, this may force us to adjust some tests, but it is for the sake of clarity.
Removing potentially incorrect normalisation: ł sometimes used for ɫ (i.e. lˠ) not ɬ, ƀ sometimes used for β not v. (Currently checking IE-CoR transcriptions and have found cases)