glottolog / glottolog-legacy

DEPRECATED. See https://github.com/clld/glottolog
12 stars 11 forks source link

possible relicts in source jsondata #37

Closed xflr6 closed 10 years ago

xflr6 commented 10 years ago

The jsondata of the references contains some data that have been mapped to a column. Also, there is some additional data that I am not sure about:

>>> print pd.read_sql(sa.text("""SELECT key, count(*) AS n
FROM (SELECT json_object_keys(jsondata::json) AS key FROM source) AS keys
WHERE key NOT IN :known
GROUP BY key ORDER BY key""").bindparams(known=known), engine)
                  key       n
0  internetarchive_id    1502
1               lcode       1
2               lgcde       1
3              lgcode  141408
4               lgcoe       1
5               notes       4
6             numnote       4
7          university       4
8          zurichcode     608

The lgcode variants and notes can be fixed in the next update with a change here. What about the other entries?

xrotwang commented 10 years ago

Where does known come from? internetarchive_id is populated from https://github.com/clld/clld/blob/master/clld/scripts/internetarchive.py#L86 The others I think should be cleaned up only upon update from a new bibfile.

d97hah commented 10 years ago

I'll fix them in the bib:s on my side for the next update

2014-10-23 13:42 GMT+02:00 Robert Forkel notifications@github.com:

Where does known come from? internetarchive_id is populated from https://github.com/clld/clld/blob/master/clld/scripts/internetarchive.py#L86 The others I think should be cleaned up only upon update from a new bibfile.

— Reply to this email directly or view it on GitHub https://github.com/clld/glottolog-data/issues/37#issuecomment-60226883.

xflr6 commented 10 years ago

known is bibtexkey, gbs plus [k for k, v in FIELD_MAP.iteritems() if v == ''] from import_refs.

For the lgcode variant and notes (or future moves from jsondata to dedicated columns) it's still necessary to fix the line in import_refs.py, right?

xflr6 commented 10 years ago

cleared nunote, university, and zurichcode with clld/glottolog3@3ce444ebdb67bfb7745ab2821a5ac5541683fe3d