Closed xflr6 closed 6 years ago
import pandas as pd query = '''SELECT length(coalesce(gbid, '')), count(*), min(gbid), 'http://glottolog.org/resource/reference/id/' || min(id) AS url FROM source JOIN ref USING (pk) GROUP BY length(coalesce(gbid, '')) ORDER BY count DESC''' pd.read_sql_query(query, 'postgresql://postgres@/glottolog3', index_col='length') Out[1]: count min \ length 0 297525 None 45 8944 http://books.google.com/books?id=--eOmgEACAAJ 12 8601 -_lCPgAACAAJ 54 5 http://www.google.com/books/feeds/volumes/3RIM... url length 0 http://glottolog.org/resource/reference/id/10 45 http://glottolog.org/resource/reference/id/100004 12 http://glottolog.org/resource/reference/id/100 54 http://glottolog.org/resource/reference/id/104284
The full urls of length 45 are accounted for, but cases like 104284 yield broken links.
Clean this on import, so we always save the 12-character id?
The full urls of length 45 are accounted for, but cases like 104284 yield broken links.
Clean this on import, so we always save the 12-character id?