Closed LinguList closed 3 years ago
Examples for lapsyd:
ɖɽ | 1
ʈɽ | 1
ɳɖɽ | 1
'ɤ̃' | 1
'ɵ' | 1
'ə̃' | 1
'ə˞' | 1
'rrʲ' | 1
'əː' | 1
d̪n̪ | 1
ɟɲ | 1
dl | 1
ɟʎ | 1
ɖɳ | 1
ɖɭ | 1
dn | 1
'ẽː' | 1
'ɤ' | 2
'õː' | 2
pʲʰ | 3
t̪ʲʰ | 3
'ẽ' | 6
'õ' | 7
'oː' | 13
'eː' | 13
'rr' | 14
'ə' | 18
'e' | 43
'o' | 45
So when exporting the data for lapsyd, these were clearly missed!
And in JIPA, we have clear cases that point to errors in the data:
ai au | 1
w̆ r | 1
n̪d̪ʲ ntʲ | 1
aʔ iʔ | 1
| 1
ǁ’ | 1
eː | 1
ɛːɒ̯ˤ; ɪːɒ̯ˤ | 1
əʁ̞ | 1
l | 1
ld | 1
l ʎ | 1
ɔoũ | 1
tʃʰ dʒ | 1
r | 1
v s | 1
ɔ̤ | 1
əɪa | 1
əʊɪ | 1
eəɪ | 1
iəɪ | 1
aʊɪ | 1
oəɪ | 1
(ɯ) | 1
(y) | 1
u ɚ | 1
uai | 1
iou | 1
uei | 1
ɛ ɛː | 1
ɒ ɒː | 1
øː ɑː | 1
jai̯ | 1
jau̯ | 1
jeu̯ | 1
wei̯ | 1
wai̯ | 1
z̻ | 1
iau | 2
E.g., all with spaces.
I thought I flagged some of the JIPA ones. A few of them I also wrote to @SimonGreenhill about. Others I may have missed.
As for the LAPSyD ones, it's a bit of a mystery to me what the problem is with some of them, e.g. t̪ʲʰ and pʲʰ, which should be fine. Checking https://github.com/cldf-clts/clts/tree/master/sources/lapsyd/graphemes I see that these are labelled BIPA
column, which points to something wrong.
In all, the fact that these things are not sorted would lead me to think that there might be a few things left to do with CLTS and that we should check also some of the other datasets. I don't have time to look at this today and probably not tomorrow either, but should have this week.
@LinguList if you tell me what I can do here, I'll do it on Monday. Is this a case of remapping these in https://github.com/cldf-clts/clts/tree/master/sources.
@cormacanderson, what I thinnk happened is that the list of graphemes.tsv we compiled for phoible, lapsyd, eurasian, and jipa are not truly showing all the symbols which we find in the original datasets (!). So what this means is that the list of graphemes should be recompiled from the datasets (by this, I mean https://github.com/cldf-datasets/jipa and the like).
The fact that even the l
is missing in JIPA is a bit alarming. But what I need to check also is if there's a space or something to it. So the procedure would be:
All in all, this can be done automatically by myself up to the point where it comes to checking the last elements.
In the meantime, @cormacanderson, if you have time, it would be nice if you already look at the results that I computed, as I'd like to know if I should compute more or if this is okay. A list of individual differences can and will also be output for you to inspect.
I have identified all sounds by tweaking cldf-datasets/lapsyd/ and the data is now completely covered.
There is one sound
We can resolve this sound in the same way as I have dealt with other unspecified coronals in LAPSyD, i.e. by using the symbol without diacritic. However, that means adding a sound to consonants.csv. I've put in a PR for this.
Nice work on resolving this @LinguList
There are inconsistencies in our transcriptiondata, resulting from:
These should be solved before releasing.