Closed AnnikaTjuka closed 1 year ago
@AnnikaTjuka, coverage is discussed in our 2018 clics paper, where we introduce average mutual coverage. But obviously, alternative ways to measure coverage are possible. To get started, a list showing which concept is reflected in how many languages and how many families, would probably be useful. This can be derived from colexifications also with R, or from Python as part of the CLDFBench routine.
Ok, I'll try it in Python first and if I don't get anywhere, I'll use R. But should we integrate the "OR" concepts first? I assume that some lists only use "hand/arm" glosses.
Yes, I can try and look at that later.
Great!
@AnnikaTjuka, I have now added two versions of a coverage overview and also modified the language families (Dogon is a valid glottolog family, and Bangime is an isolate, so I replaced Bangime by Dogon, although some say Dogon is Atlantic-Congo, which is not clear, however).
The coverage is now calculated upon running the first lexibank.makecldf script.
Great, thanks! I just ran the updated version of lexibank.makecldf
and looked at the distribution. I think in some cases it is predictable that there are not so many instances (e.g. FINGERTIP). But FACE should occur in more than 14 language families. I'll check these concepts in addition to the emotion concepts in CLICS regarding phonetic transcriptions.
@AnnikaTjuka, I have now added a script computing coverage.
Languages | Concepts | Coverage | Coverage Ratio | Families | Valid Languages | Average Concepts |
---|---|---|---|---|---|---|
1446 | 1500 | 136.231 | 0.0908205 | 19 | 1442 | 316.757 |
1446 | 1400 | 135.963 | 0.0971164 | 19 | 1442 | 311.558 |
1446 | 1300 | 135.616 | 0.10432 | 19 | 1442 | 305.64 |
1446 | 1200 | 135.193 | 0.112661 | 19 | 1442 | 299.111 |
1446 | 1100 | 134.666 | 0.122423 | 19 | 1442 | 291.828 |
1446 | 1000 | 134.013 | 0.134013 | 19 | 1442 | 283.725 |
1446 | 900 | 133.189 | 0.147988 | 19 | 1442 | 274.636 |
1446 | 800 | 132.106 | 0.165132 | 19 | 1442 | 264.207 |
1446 | 700 | 130.695 | 0.186707 | 19 | 1442 | 252.324 |
1446 | 600 | 128.708 | 0.214514 | 19 | 1442 | 238.221 |
1446 | 500 | 125.922 | 0.251843 | 19 | 1442 | 221.534 |
1446 | 400 | 121.654 | 0.304135 | 19 | 1442 | 200.905 |
1446 | 300 | 114.782 | 0.382607 | 19 | 1442 | 174.77 |
1400 | 1500 | 142.432 | 0.0949549 | 19 | 1396 | 325.009 |
1400 | 1400 | 142.148 | 0.101534 | 19 | 1396 | 319.649 |
1400 | 1300 | 141.779 | 0.109061 | 19 | 1396 | 313.55 |
1400 | 1200 | 141.329 | 0.117774 | 19 | 1396 | 306.816 |
1400 | 1100 | 140.769 | 0.127972 | 19 | 1396 | 299.313 |
1400 | 1000 | 140.075 | 0.140075 | 19 | 1396 | 290.957 |
1400 | 900 | 139.199 | 0.154666 | 19 | 1396 | 281.586 |
1400 | 800 | 138.05 | 0.172563 | 19 | 1396 | 270.844 |
1400 | 700 | 136.552 | 0.195075 | 19 | 1396 | 258.604 |
1400 | 600 | 134.443 | 0.224072 | 19 | 1396 | 244.074 |
1400 | 500 | 131.486 | 0.262972 | 19 | 1396 | 226.884 |
1400 | 400 | 126.957 | 0.317392 | 19 | 1396 | 205.633 |
1400 | 300 | 119.675 | 0.398916 | 19 | 1396 | 178.726 |
1300 | 1500 | 152.71 | 0.101807 | 18 | 1293 | 341.913 |
1300 | 1400 | 152.38 | 0.108843 | 18 | 1293 | 336.145 |
1300 | 1300 | 151.953 | 0.116887 | 18 | 1293 | 329.582 |
1300 | 1200 | 151.432 | 0.126194 | 18 | 1293 | 322.337 |
1300 | 1100 | 150.784 | 0.137076 | 18 | 1293 | 314.261 |
1300 | 1000 | 149.981 | 0.149981 | 18 | 1293 | 305.276 |
1300 | 900 | 148.968 | 0.16552 | 18 | 1293 | 295.197 |
1300 | 800 | 147.64 | 0.18455 | 18 | 1293 | 283.648 |
1300 | 700 | 145.911 | 0.208444 | 18 | 1293 | 270.511 |
1300 | 600 | 143.478 | 0.23913 | 18 | 1293 | 254.92 |
1300 | 500 | 140.063 | 0.280126 | 18 | 1293 | 236.45 |
1300 | 400 | 134.857 | 0.337142 | 18 | 1293 | 213.677 |
1300 | 300 | 126.51 | 0.421699 | 18 | 1293 | 184.872 |
1200 | 1500 | 162.106 | 0.108071 | 17 | 1194 | 358.733 |
1200 | 1400 | 161.72 | 0.115514 | 17 | 1194 | 352.493 |
1200 | 1300 | 161.222 | 0.124017 | 17 | 1194 | 345.412 |
1200 | 1200 | 160.613 | 0.133844 | 17 | 1194 | 337.572 |
1200 | 1100 | 159.853 | 0.145321 | 17 | 1194 | 328.831 |
1200 | 1000 | 158.915 | 0.158915 | 17 | 1194 | 319.123 |
1200 | 900 | 157.736 | 0.175263 | 17 | 1194 | 308.249 |
1200 | 800 | 156.196 | 0.195245 | 17 | 1194 | 295.816 |
1200 | 700 | 154.184 | 0.220263 | 17 | 1194 | 281.646 |
1200 | 600 | 151.36 | 0.252267 | 17 | 1194 | 264.855 |
1200 | 500 | 147.381 | 0.294763 | 17 | 1194 | 244.921 |
1200 | 400 | 141.383 | 0.353456 | 17 | 1194 | 220.495 |
1200 | 300 | 131.77 | 0.439235 | 17 | 1194 | 189.591 |
1100 | 1500 | 169.75 | 0.113167 | 17 | 1094 | 375.909 |
1100 | 1400 | 169.291 | 0.120922 | 17 | 1094 | 369.108 |
1100 | 1300 | 168.7 | 0.129769 | 17 | 1094 | 361.388 |
1100 | 1200 | 167.975 | 0.139979 | 17 | 1094 | 352.842 |
1100 | 1100 | 167.072 | 0.151884 | 17 | 1094 | 343.312 |
1100 | 1000 | 165.959 | 0.165959 | 17 | 1094 | 332.733 |
1100 | 900 | 164.563 | 0.182848 | 17 | 1094 | 320.904 |
1100 | 800 | 162.742 | 0.203428 | 17 | 1094 | 307.389 |
1100 | 700 | 160.357 | 0.229082 | 17 | 1094 | 291.963 |
1100 | 600 | 157.01 | 0.261684 | 17 | 1094 | 273.686 |
1100 | 500 | 152.292 | 0.304584 | 17 | 1094 | 251.979 |
1100 | 400 | 145.221 | 0.363052 | 17 | 1094 | 225.48 |
1100 | 300 | 133.913 | 0.446376 | 17 | 1094 | 191.97 |
1000 | 1500 | 179.139 | 0.119426 | 17 | 994 | 395.187 |
1000 | 1400 | 178.588 | 0.127563 | 17 | 994 | 387.733 |
1000 | 1300 | 177.875 | 0.136827 | 17 | 994 | 379.256 |
1000 | 1200 | 177.001 | 0.147501 | 17 | 994 | 369.873 |
1000 | 1100 | 175.915 | 0.159923 | 17 | 994 | 359.424 |
1000 | 1000 | 174.578 | 0.174578 | 17 | 994 | 347.832 |
1000 | 900 | 172.907 | 0.192119 | 17 | 994 | 334.895 |
1000 | 800 | 170.725 | 0.213406 | 17 | 994 | 320.107 |
1000 | 700 | 167.865 | 0.239807 | 17 | 994 | 303.216 |
1000 | 600 | 163.84 | 0.273067 | 17 | 994 | 283.18 |
1000 | 500 | 158.188 | 0.316377 | 17 | 994 | 259.425 |
1000 | 400 | 149.817 | 0.374543 | 17 | 994 | 230.653 |
1000 | 300 | 136.463 | 0.454878 | 17 | 994 | 194.261 |
900 | 1500 | 190.197 | 0.126798 | 16 | 892 | 417.132 |
900 | 1400 | 189.519 | 0.135371 | 16 | 892 | 408.866 |
900 | 1300 | 188.642 | 0.145109 | 16 | 892 | 399.466 |
900 | 1200 | 187.567 | 0.156306 | 16 | 892 | 389.062 |
900 | 1100 | 186.232 | 0.169301 | 16 | 892 | 377.478 |
900 | 1000 | 184.589 | 0.184589 | 16 | 892 | 364.633 |
900 | 900 | 182.546 | 0.202829 | 16 | 892 | 350.336 |
900 | 800 | 179.881 | 0.224852 | 16 | 892 | 334 |
900 | 700 | 176.372 | 0.251961 | 16 | 892 | 315.298 |
900 | 600 | 171.438 | 0.285731 | 16 | 892 | 293.118 |
900 | 500 | 164.517 | 0.329034 | 16 | 892 | 266.836 |
900 | 400 | 154.357 | 0.385893 | 16 | 892 | 235.286 |
900 | 300 | 138.198 | 0.460661 | 16 | 892 | 195.321 |
800 | 1500 | 205.693 | 0.137129 | 16 | 793 | 442.289 |
800 | 1400 | 204.845 | 0.146318 | 16 | 793 | 433.044 |
800 | 1300 | 203.75 | 0.156731 | 16 | 793 | 422.548 |
800 | 1200 | 202.403 | 0.168669 | 16 | 793 | 410.915 |
800 | 1100 | 200.74 | 0.182491 | 16 | 793 | 397.998 |
800 | 1000 | 198.72 | 0.19872 | 16 | 793 | 383.779 |
800 | 900 | 196.255 | 0.218061 | 16 | 793 | 368.125 |
800 | 800 | 193.053 | 0.241316 | 16 | 793 | 350.267 |
800 | 700 | 188.797 | 0.26971 | 16 | 793 | 329.699 |
800 | 600 | 182.778 | 0.304631 | 16 | 793 | 305.231 |
800 | 500 | 174.464 | 0.348929 | 16 | 793 | 276.457 |
800 | 400 | 162.435 | 0.406089 | 16 | 793 | 242.251 |
800 | 300 | 143.643 | 0.478811 | 16 | 793 | 199.213 |
700 | 1500 | 226.668 | 0.151112 | 16 | 693 | 471.22 |
700 | 1400 | 225.57 | 0.161121 | 16 | 693 | 460.709 |
700 | 1300 | 224.153 | 0.172426 | 16 | 693 | 448.783 |
700 | 1200 | 222.405 | 0.185337 | 16 | 693 | 435.536 |
700 | 1100 | 220.291 | 0.200264 | 16 | 693 | 421.049 |
700 | 1000 | 217.744 | 0.217744 | 16 | 693 | 405.239 |
700 | 900 | 214.702 | 0.238558 | 16 | 693 | 388.17 |
700 | 800 | 210.836 | 0.263545 | 16 | 693 | 368.856 |
700 | 700 | 205.662 | 0.293803 | 16 | 693 | 346.389 |
700 | 600 | 198.266 | 0.330443 | 16 | 693 | 319.457 |
700 | 500 | 188.365 | 0.37673 | 16 | 693 | 288.344 |
700 | 400 | 174.257 | 0.435643 | 16 | 693 | 251.554 |
700 | 300 | 153.249 | 0.51083 | 16 | 693 | 206.269 |
600 | 1500 | 253.615 | 0.169076 | 16 | 593 | 506.833 |
600 | 1400 | 252.167 | 0.180119 | 16 | 593 | 494.802 |
600 | 1300 | 250.3 | 0.192538 | 16 | 593 | 481.167 |
600 | 1200 | 247.979 | 0.206649 | 16 | 593 | 465.967 |
600 | 1100 | 245.204 | 0.222913 | 16 | 593 | 449.488 |
600 | 1000 | 241.886 | 0.241886 | 16 | 593 | 431.685 |
600 | 900 | 237.932 | 0.264369 | 16 | 593 | 412.635 |
600 | 800 | 233.071 | 0.291339 | 16 | 593 | 391.437 |
600 | 700 | 226.441 | 0.323488 | 16 | 593 | 366.305 |
600 | 600 | 217.01 | 0.361683 | 16 | 593 | 336.182 |
600 | 500 | 204.639 | 0.409279 | 16 | 593 | 301.712 |
600 | 400 | 187.348 | 0.468369 | 16 | 593 | 261.313 |
600 | 300 | 162.924 | 0.543079 | 16 | 593 | 212.787 |
500 | 1500 | 283.783 | 0.189189 | 14 | 490 | 548.874 |
500 | 1400 | 281.763 | 0.201259 | 14 | 490 | 534.724 |
500 | 1300 | 279.378 | 0.214906 | 14 | 490 | 519.678 |
500 | 1200 | 276.1 | 0.230084 | 14 | 490 | 501.692 |
500 | 1100 | 272.395 | 0.247632 | 14 | 490 | 482.866 |
500 | 1000 | 267.906 | 0.267906 | 14 | 490 | 462.45 |
500 | 900 | 262.469 | 0.291632 | 14 | 490 | 440.41 |
500 | 800 | 255.999 | 0.319999 | 14 | 490 | 416.266 |
500 | 700 | 247.3 | 0.353286 | 14 | 490 | 387.816 |
500 | 600 | 234.759 | 0.391265 | 14 | 490 | 353.322 |
500 | 500 | 218.527 | 0.437054 | 14 | 490 | 313.974 |
500 | 400 | 196.696 | 0.49174 | 14 | 490 | 268.832 |
500 | 300 | 167.514 | 0.55838 | 14 | 490 | 215.912 |
400 | 1500 | 338.038 | 0.225358 | 9 | 391 | 609.42 |
400 | 1400 | 335.042 | 0.239315 | 9 | 391 | 592.533 |
400 | 1300 | 331.558 | 0.255044 | 9 | 391 | 575.4 |
400 | 1200 | 326.591 | 0.272159 | 9 | 391 | 553.577 |
400 | 1100 | 321.061 | 0.291874 | 9 | 391 | 531.168 |
400 | 1000 | 314.33 | 0.31433 | 9 | 391 | 506.815 |
400 | 900 | 306.06 | 0.340066 | 9 | 391 | 480.3 |
400 | 800 | 296.792 | 0.37099 | 9 | 391 | 452.195 |
400 | 700 | 284.168 | 0.405954 | 9 | 391 | 418.815 |
400 | 600 | 265.744 | 0.442907 | 9 | 391 | 377.707 |
400 | 500 | 242.803 | 0.485607 | 9 | 391 | 331.483 |
400 | 400 | 212.027 | 0.530067 | 9 | 391 | 278.572 |
400 | 300 | 173.896 | 0.579653 | 9 | 391 | 218.82 |
300 | 1500 | 420.935 | 0.280623 | 7 | 291 | 692.227 |
300 | 1400 | 416.024 | 0.29716 | 7 | 291 | 671.353 |
300 | 1300 | 410.424 | 0.315711 | 7 | 291 | 650.187 |
300 | 1200 | 402.308 | 0.335257 | 7 | 291 | 623.073 |
300 | 1100 | 393.177 | 0.357433 | 7 | 291 | 594.62 |
300 | 1000 | 382.422 | 0.382422 | 7 | 291 | 564.327 |
300 | 900 | 368.802 | 0.40978 | 7 | 291 | 530.6 |
300 | 800 | 355.102 | 0.443877 | 7 | 291 | 497.593 |
300 | 700 | 335.35 | 0.479072 | 7 | 291 | 456.627 |
300 | 600 | 307.18 | 0.511967 | 7 | 291 | 406.7 |
300 | 500 | 277.038 | 0.554076 | 7 | 291 | 354.883 |
300 | 400 | 237.02 | 0.59255 | 7 | 291 | 294.96 |
300 | 300 | 190.282 | 0.634272 | 7 | 291 | 229.33 |
Of these options, we should choose one. Note that languages are now represented by unique glottocodes. This is an advantage, as it allows us to filter out those languages which have duplicated glottocodes for the analysis later on.
I think something like 1400 languages and 1000 concepts may be useful. We should add one more language family, though.
I think this issue can be closed. I'd propose not making any more changes to the languages and concepts at this stage.
We need to check the coverage for body part terms. I need to think of a way how this can be best done. Do you have any ideas, how we make sure to have sampled enough body part concepts from the list? Mutual coverage, a minimal amount of a concept list? Maybe we discuss this in an issue?
Originally posted by @LinguList in https://github.com/clics/clicsbp/pull/13#pullrequestreview-815973046