Closed joeylovestrand closed 2 years ago
Good idea! It might be tricky to decide what level of subfamily to use, but if they settled on something like the WALS-genera sets that might work.
WALS genera are not defined for all Glottolog families and not all WALS genera have corresponding Glottolog subgroups. I think the only pragmatic (in term of UI) option is allowing drilling down one more level from top-level families.
Okay. I think that's fixable (the matching of WALS-genera and glottolog subgroups), but nevermind.
The technique of using just the first-order went a bit weird elsewhere at the Grambank-website, didn't it? For example, when a language-leveled languoid is the direct descendant of the root etc. But, sure that does sound the easiest UI-wise.
Maybe one simple solution would be, since levels and sub-groups are not clearly defined, to allow to set any glottocode as parameter (so to speak the start point of a tree). As for Chadic, one would enter chad1250
to get this branch only. This would work for any scenario, even for dialects of a single language.
@joeylovestrand would a cookbook recipe work for you? Something along the lines of the code below - although I'll add a bit more description of what's going on for a recipe.
$ csvgrep -c Parameter_ID -r"^classification" cldf/values.csv | csvgrep -c Value -m "chad1250" | csvcut -c Language_ID > chadic.csv
$ csvgrep -c Parameter_ID -r"^med$" cldf/values.csv > meds.csv
$ csvjoin -c Language_ID chadic.csv meds.csv | csvstat
1. "Language_ID"
Type of data: Text
Contains null values: False
Unique values: 206
Longest value: 8 characters
Most common values: suku1272 (1x)
mina1276 (1x)
mbed1242 (1x)
gava1241 (1x)
buwa1243 (1x)
2. "ID"
Type of data: Text
Contains null values: False
Unique values: 206
Longest value: 12 characters
Most common values: suku1272-med (1x)
mina1276-med (1x)
mbed1242-med (1x)
gava1241-med (1x)
buwa1243-med (1x)
3. "Parameter_ID"
Type of data: Text
Contains null values: False
Unique values: 1
Longest value: 3 characters
Most common values: med (206x)
4. "Value"
Type of data: Number
Contains null values: False
Unique values: 5
Smallest value: 0
Largest value: 4
Sum: 517
Mean: 2,51
Median: 3
StDev: 1,457
Most common values: 4 (77x)
2 (47x)
0 (33x)
3 (33x)
1 (16x)
5. "Code_ID"
Type of data: Text
Contains null values: False
Unique values: 5
Longest value: 21 characters
Most common values: med-wordlist_or_less (77x)
med-grammar_sketch (47x)
med-long_grammar (33x)
med-phonology_or_text (33x)
med-grammar (16x)
6. "Comment"
Type of data: Boolean
Contains null values: True (excluded from calculations)
Unique values: 1
Most common values: None (206x)
7. "Source"
Type of data: Text
Contains null values: False
Unique values: 169
Longest value: 41 characters
Most common values: hh:hvw:JungraithmayrIbriszimow:CLR (11x)
hh:hw:Kraft:Chadic:II (5x)
hh:hs:Schuh:Bole-Tangale (5x)
hh:hw:Kraft:Chadic:III (4x)
hh:w:Brye:Jimjimen-Gude-Tsuvan-Sharwa (3x)
8. "codeReference"
Type of data: Boolean
Contains null values: True (excluded from calculations)
Unique values: 1
Most common values: None (206x)
Row count: 206
@joeylovestrand here's the draft: https://github.com/glottolog/cookbook/blob/master/recipes/glottolog_cldf/documentation_status_for_subgroup.md
@xrotwang Thanks for this! I haven't used the cookbook, but it looks similar enough to R/Python that I assume I could figure it out. Was certainly easier to have Harald do it for me 😁 but it will be great to be able to update the numbers on my own!
Would it be possible to allow Glottoscope to filter for sub-families (e.g. Chadic and not just all Afro-Asiatic)?
I would particularly be interested in having the Tally numbers of levels of description.
(Will be including a Glottoscope map in my next presentation - so thanks for this tool!)