glottolog / glottolog-legacy

DEPRECATED. See https://github.com/clld/glottolog
12 stars 11 forks source link

reconcile languages with same name but different hid #9

Closed xrotwang closed 9 years ago

xrotwang commented 10 years ago

An example is Central Bontoc, but there are potentially more:

glottolog3=# select l.name, count(ll.hid) as c from language as l, languoid as ll where l.pk = ll.pk group by name order by c desc;
xrotwang commented 10 years ago

Numbers for glottolog 2.3:

glottolog3=# select count(*) from (select l.name, count(ll.hid) as c from language as l, languoid as ll where l.pk = ll.pk group by name order by c desc) as s where s.c > 1;
count 
-------
24

             name               | c 
--------------------------------+---
Anguthimri                      | 2
Bugan                           | 2
Central Bontoc                  | 2
Chimakum                        | 2
Dama                            | 2
Ganai                           | 2
Garawa                          | 2
Koro                            | 2
Lautu                           | 2
Mango                           | 2
Martha's Vineyard Sign Language | 2
Mayo                            | 2
Nepali                          | 2
Ngala                           | 2
Patwin                          | 2
Surigaonon                      | 2
Taita                           | 2
Tunen                           | 2
Unggumi                         | 2
Vili                            | 2
Wirangu                         | 2
Yao                             | 2
Yendang                         | 2
Yir Yoront                      | 2
xflr6 commented 10 years ago

Here is a list restricted to active languages with the same name:

print pd.read_sql_query('''SELECT l.name, array_agg(l.id order by l.id) AS ids, array_agg(ll.hid order by id) AS hids
FROM language AS l JOIN languoid AS ll ON l.pk = ll.pk
WHERE l.active AND ll.status = 'established' AND ll.level = 'language'
GROUP BY l.name HAVING count(*) > 1 ORDER BY l.name''', engine)
                               name                   ids                                 hids
0                    Central Bontoc  [cent2083, cent2292]                           [bnc, lbk]
1                              Dama  [dama1262, dama1267]                   [NOCODE_Dama, dmm]
2                             Ganai  [gana1268, gana1278]                           [ihw, unn]
3                             Mango  [mang1398, mang1429]                  [mge, NOCODE_Mango]
4   Martha's Vineyard Sign Language  [mart1251, mart1258]  [mre, NOCODE_Marthas-Vineyard-Sign]
5                            Nepali  [nepa1252, nepa1254]                           [nep, npi]
6                             Ngala  [ngal1300, ngal1301]            [nud, NOCODE_Ngala-Barth]
7                             Taita  [tait1247, tait1250]                  [NOCODE_Taita, dav]
8                              Vili  [vili1238, vili1239]                   [vif, NOCODE_Vili]
9                           Wirangu  [wira1260, wira1265]                           [wiw, wgu]
10                              Yao  [yaoa1239, yaoo1241]                    [NOCODE_Yao, yao]
d97hah commented 10 years ago

The [cent2083] / [bnc] is a macro-language and should be renamed Bontoc rather than Central Bontoc.

I've renamed one of the Dama:s so the prob should go away in the next update.

The [ihw] should be called Birdhawal and is so named in my files, the prob should go away in the next update.

I've renamed one of the Mango:s so the prob should go away in the next update.

My mistake, the [mart1258] and [NOCODE_Marthas-Vineyard-Sign] should be retired, and have been so in my files for the next update.

Nepali [nep] / [nepa1252] -- should be taken out of the tree it's a macro-language. The [nep] macrolanguage could be called Eastern Pahari.

I've renamed one of each so the prob should go away in the next update.

9 Wirangu [wira1260, wira1265] [wiw, wgu]

[wiw] / [wira1260] is retired and I've added it as such in my files and renamed it Wirangu-Nauo (because it was split into those).

2014-11-13 15:35 GMT+01:00 Sebastian Bank notifications@github.com:

Here is a list restricted to active languages with the same name:

print pd.read_sql_query('''SELECT l.name, array_agg(l.id order by l.id) AS ids, array_agg(ll.hid order by id) AS hidsFROM language AS l JOIN languoid AS ll ON l.pk = ll.pkWHERE l.active AND ll.status = 'established' AND ll.level = 'language'GROUP BY l.name HAVING count(*) > 1 ORDER BY l.name''', engine) name ids hids0 Central Bontoc [cent2083, cent2292] [bnc, lbk]1 Dama [dama1262, dama1267] [NOCODE_Dama, dmm]2 Ganai [gana1268, gana1278] [ihw, unn]3 Mango [mang1398, mang1429] [mge, NOCODE_Mango]4 Martha's Vineyard Sign Language [mart1251, mart1258] [mre, NOCODE_Marthas-Vineyard-Sign]5 Nepali [nepa1252, nepa1254] [nep, npi]6 Ngala [ngal1300, ngal1301] [nud, NOCODE_Ngala-Barth]7 Taita [tait1247, tait1250] [NOCODE_Taita, dav]8 Vili [vili1238, vili1239] [vif, NOCODE_Vili]9 Wirangu [wira1260, wira1265] [wiw, wgu]10 Yao [yaoa1239, yaoo1241] [NOCODE_Yao, yao]

  • [cent2083, cent2292] and [nepa1252, nepa1254] macrolanguoid and language: move the latter under the former, set level of the former to family
  • [dama1262, dama1267], [mart1258, mart1251], and [tait1247, tait1250] NOCODE/unclassified and code assigned: retire the former, move references/other information to the latter
  • [gana1268, gana1278] Gunai dialect cluster, maybe rename to Bidhawal and Kurnai?
  • [mang1398, mang1429], [ngal1300, ngal1301] and [yaoa1239, yaoo1241] just different languages: nothing to do (?)
  • [vili1238, vili1239] related languages, maybe rename to Vili (Civili) and Vili (Ibhili)?
  • [wira1260, wira1265] retired and active: make the former (spurious?) retired/unclassified and move references/other information over to the latter?

— Reply to this email directly or view it on GitHub https://github.com/clld/glottolog-data/issues/9#issuecomment-62898401.

xflr6 commented 9 years ago

With Glottolog 2.4:

       name                   ids                       hids
0      Dama  [dama1262, dama1267]         [NOCODE_Dama, dmm]
1     Ganai  [gana1268, gana1278]                 [ihw, unn]
2     Mango  [mang1398, mang1429]        [mge, NOCODE_Mango]
3    Nepali  [nepa1252, nepa1254]                 [nep, npi]
4     Ngala  [ngal1300, ngal1301]  [nud, NOCODE_Ngala-Barth]
5     Taita  [tait1247, tait1250]        [NOCODE_Taita, dav]
6      Vili  [vili1238, vili1239]         [vif, NOCODE_Vili]
7  Wagawaga  [waga1262, waga1268]                 [wgw, wgb]
8       Yao  [yaoa1239, yaoo1241]          [NOCODE_Yao, yao]

Updates do not automatically follow name changes,from lff/lof.txt so:

@d97hah some (all?) of the renames you mentioned do not seem to be included in hh17.txt (and therefore in languages.yaml), is this a glitch?

d97hah commented 9 years ago

2015-04-10 16:42 GMT+02:00 Sebastian Bank notifications@github.com:

With Glottolog 2.4:

   name                   ids                       hids

0 Dama [dama1262, dama1267] [NOCODE_Dama, dmm] 1 Ganai [gana1268, gana1278] [ihw, unn] 2 Mango [mang1398, mang1429] [mge, NOCODE_Mango] 3 Nepali [nepa1252, nepa1254] [nep, npi] 4 Ngala [ngal1300, ngal1301] [nud, NOCODE_Ngala-Barth] 5 Taita [tait1247, tait1250] [NOCODE_Taita, dav] 6 Vili [vili1238, vili1239] [vif, NOCODE_Vili] 7 Wagawaga [waga1262, waga1268] [wgw, wgb] 8 Yao [yaoa1239, yaoo1241] [NOCODE_Yao, yao]

Updates do not automatically follow name changes,from lff/lof.txt so:

All good!

@d97hah https://github.com/d97hah some (all?) of the renames you mentioned https://github.com/clld/glottolog-data/issues/9#issuecomment-63078713 do not seem to be included in hh17.txt https://github.com/clld/glottolog-data/blob/master/languoids/hh17.txt (and therefore in languages.yaml), is this a glitch?

It's a feature :-). [In my old setup] the lg-names in hh17.txt are not significant, while the names in lff.txt are. Also the HH-classification in hh17.txt is not significant (lff.txt is) and then there's the code+name issue. Otherwise the fields of hh17.txt should be sifnificant. In our new setup, maybe the insignificnt things should be taken out of languoids.yaml or out of lff.txt, to avoid mixups. Whatever you prefer!

— Reply to this email directly or view it on GitHub https://github.com/clld/glottolog-data/issues/9#issuecomment-91579723.

xflr6 commented 9 years ago

What is the wanted spelling for the new name of gana1268 (see above): Birdhawal or Birrdhawal or Bidhawal (if not the second, maybe update lff.txt)?

@haspelmath any concerns about the new names (parenthesis, use of 'of')? Rename both or maybe only the non-iso-code Dama?

d97hah commented 9 years ago

Birrdhawal is preferred!

2015-04-13 14:13 GMT+02:00 Sebastian Bank notifications@github.com:

What is the wanted spelling for the new name of gana1268 (see above https://github.com/clld/glottolog-data/issues/9#issuecomment-91579723): Birdhawal or Birrdhawal or Bidhawal (if not the second, maybe update lff.txt)?

@haspelmath https://github.com/haspelmath any concerns about the new names (parenthesis, use of 'of')? Rename both or maybe only the non-iso-code Dama?

— Reply to this email directly or view it on GitHub https://github.com/clld/glottolog-data/issues/9#issuecomment-92331051.