cldf-clts / clts

Cross-Linguistic Transcription Systems
https://clts.clld.org
14 stars 3 forks source link

V13a #39

Closed LinguList closed 3 years ago

LinguList commented 3 years ago
DATA STATS PERC
Unique graphemes 12384
different sounds 8754
singletons 8802
multiples 3582
consonants 5512 0.6296550148503541
vowels 1844 0.21064656157185288
diphthongs 707 0.0807630797349783
clusters 559 0.06385652273246516
tones 132 0.015078821110349555
LinguList commented 3 years ago
id            valid    total      percent
------------  -------  -------  ---------
apics         177      177           1.00
bdpa          1328     1466          0.91
bdproto       734      794           0.92
beijingdaxue  124      124           1.00
chomsky       45       45            1.00
diachronica   552      652           0.85
eurasian      1347     1562          0.86
jipa          894      967           0.92
lapsyd        696      795           0.88
multimedia    132      138           0.96
nidaba        1864     1936          0.96
panphon       6219     6334          0.98
pbase         810      1068          0.76
phoible       2574     3183          0.81
powoco        369      378           0.98
ruhlen        434      701           0.62
saphon        343      357           0.96
segbo         215      219           0.98
wiki          166      184           0.90
18                                   0.90
LinguList commented 3 years ago

@tresoldi, @cormacanderson, I added bdproto, segbo, and saphon, which gives us 18 datasets now, and I used our workflow to correct these marginal datasets. Please have a look once you find time, as this shows how the workflow works. The important files are all those in the folder sources, called "graphemes.tsv".

tresoldi commented 3 years ago

I think we can merge and keep refining the mapping later, correct?

LinguList commented 3 years ago

Yes, just merge. You can also merge the Python code. And you could have a look at adding more datasets (there are still issues, I prepared even more dataset in the morning, which you filed), and refining, e.g., the jipa, which I just made as well.