clld / glottolog3

glottolog2 re-implemented as CLLD app
MIT License
112 stars 27 forks source link

add macroarea to csv export of languages #45

Closed xrotwang closed 9 years ago

HedvigS commented 9 years ago

(I support this very much, and WALS- genera and family would also be nice.)

xrotwang commented 9 years ago

Unfortunately WALS genera and families are a moving target, e.g. https://github.com/clld/wals-data/issues/35 So while we still have this info (more or less outdated) in the Glottolog database, I don't think it should be disseminated widely.

HedvigS commented 9 years ago

Check. Glottolog families though?

xrotwang commented 9 years ago

Yeah, we have those.

xrotwang commented 9 years ago

Macroareas related to languages can be retrieved using clldclient:

$ clld-download-table glottolog.org language --with-html | in2csv -f json | csvstat
  1. id
    <type 'unicode'>
    Nulls: False
    Unique values: 6090
    5 most frequent values:
        tson1249:   3
        east2304:   3
        warr1255:   3
        wals1238:   3
        tawa1286:   3
    Max length: 8
  2. name
    <type 'unicode'>
    Nulls: False
    Unique values: 6090
    Max length: 234
  3. top-level family
    <type 'unicode'>
    Nulls: True
    Unique values: 241
    Max length: 202
  4. iso
    <type 'unicode'>
    Nulls: True
    Unique values: 5553
    5 most frequent values:
        mdt:    3
        smy:    3
        ata:    3
        kfw:    3
        kft:    3
    Max length: 4
  5. macro-area
    <type 'unicode'>
    Nulls: True
    Unique values: 6
    5 most frequent values:
        Eurasia:    255
        Papunesia:  241
        Africa: 210
        North America:  122
        South America:  104
    Max length: 13
  6. child_dialect_count
    <type 'int'>
    Nulls: False
    Min: 0
    Max: 125
    Sum: 11498
    Mean: 1.44792847248
    Median: 0
    Standard Deviation: 3.18417804984
    Unique values: 31
    5 most frequent values:
        0:  5091
        2:  852
        3:  605
        4:  315
        1:  310
  7. latitude
    <type 'float'>
    Nulls: True
    Min: -54.79
    Max: 73.14
    Sum: 67760.55
    Mean: 9.18911716843
    Median: 6.575
    Standard Deviation: 19.4747925901
    Unique values: 3627
    5 most frequent values:
        6.63:   12
        23.68:  11
        -5.21:  10
        -5.58:  10
        11.13:  9
  8. longitude
    <type 'float'>
    Nulls: True
    Min: -178.78
    Max: 179.17
    Sum: 354330.52
    Mean: 48.051331706
    Median: 38.49
    Standard Deviation: 79.6854697262
    Unique values: 4647
    5 most frequent values:
        107.18: 11
        145.71: 10
        50.0:   9
        145.54: 9
        145.76: 8

Row count: 7941