clld / glottolog3

glottolog2 re-implemented as CLLD app
MIT License
110 stars 27 forks source link

updated according to paper #100

Closed xrotwang closed 5 years ago

xrotwang commented 5 years ago

Note the re-arranged info in https://github.com/clld/glottolog3/pull/100/files#diff-285b8fe530ae7abb864f5868e6273b7e and https://github.com/clld/glottolog3/pull/100/files#diff-3421880dc399bd7f8b155016a7641ff0

d97hah commented 5 years ago

Looks great!

HedvigS commented 5 years ago

Thanks @xrotwang

HedvigS commented 5 years ago

Will https://cdstar.shh.mpg.de/bitstreams/EAEA0-E7DE-FA06-8817-0/glottolog_languoid.csv.zip also update?

xrotwang commented 5 years ago

Not unless we do a bugfix release. But maybe correcting the download for 3.4 is enough?

Hedvig Skirgård notifications@github.com schrieb am So., 7. Okt. 2018, 12:59:

Will

https://cdstar.shh.mpg.de/bitstreams/EAEA0-E7DE-FA06-8817-0/glottolog_languoid.csv.zip also update?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/clld/glottolog3/pull/100#issuecomment-427644197, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1HKImi_ISXedZochQrJRzUdY1yngCsks5uid6XgaJpZM4XJxVA .

HedvigS commented 5 years ago

Right, okay. Well, I don't know what other users need and do.

I know that I would prefer it if that was updated, or if I could run a command in pyglottolog that shuffles the same table together based on the most recent repos data. But that's my problem, and I need to figure out how to combine the various tools in pyglottolog to make that work (have not solved that yet). I honestly have a very bad sense of what other users of Glottolog and other CLLD projects do and need. I'm getting the feeling that I'm not the intended user. Reading through the help on pyglottolog, what I understand of the functions they are not the kinds of things I tend to need to do. So, you're probably better off ignoring me and focusing on the core user group and they probably don't need the download files to be updated.

xrotwang commented 5 years ago

@HedvigS there's no functionality listing all languoids in a tabular way in the pyglottolog command line interface glottolog - if that is what you are looking for. And yes, most of the functionality of this cli is geared towards data maintenance in clld/glottolog. However, pyglottolog offers programmatic access to Glottolog data; i.e. if accessing the data from within python programs. A minimal program listing languages and their endangerment status would look as follows:

from pyglottolog import Glottolog
from pyglottolog.languoids import Level

for l in Glottolog().languoids():
    if l.level == Level.language:
        print('{0},{1}'.format(l.id, getattr(l.endangerment, 'description', '')))
HedvigS commented 5 years ago

Thanks @xrotwang , I understand. It does sound then like the intended user group isn't me, and that's alright.

I should just write a script in python and/or R that calls pyglottolog and generates the tables and trees that I need for other scripts based of the latest synced GitHub repos. That's easiest in the long run, then I wouldn't bother you for updating the csv file and I could share that script to others who have similar needs. I haven't needed to do this just yet, but it seems like it would be better in the long term.

There's the unofficial R-package lingtypology that does some of this, but it's not being updated and in general doesn't work the way I would prefer it to.

xflr6 commented 5 years ago

Not sure if this helps, but treedb.py actually reads the latest info from languoids/tree into tables (a sqlite3 database).

By default it also creates one big table as a Pandas DataFrame (df) querying the database, which you can dump into a CSV file like this:

$ python -i treedb.py
>>> df.to_csv('treedb.csv', encoding='utf-8')

You can also use _backend.export() to create a ZIP-file with one CSV per database table.

If you want to do more complicated stuff, it is probably best to directly work with the treedb.sqlite3 file (e.g. from R or with database tools such as DBeaver).

HedvigS commented 5 years ago

Thanks @xflr6 , but that's alright. Newick is really the format I would prefer here.

xflr6 commented 5 years ago

I assume this is for the tree: You can use the newick command, which currently requires specifying a start node glottocode:

$ glottolog newick atla1278 > atla1278.newick

With clld/glottolog#259, you can also omit it (to dump the full tree):

$ glottolog newick > glottolog.newick
HedvigS commented 5 years ago

Thanks @xflr6 . I'm sorry, this got confusing. This was actually not mainly about trees but more an more simple lookup table with this update.

Recently, I noticed that the repos data, the csv downloads and Glottoscope all had an outdated version of the endangerment status, not the AES in the latest paper but something else. I told Harald, and then Robert made this commit with the new updated AES endangerment status (hurray!).

In connection to that, I was wondering if the download files on the website would also be updated for the endangerment status (and preferably also MED) along with this commit and Robert said no. Since I've asked this before, and Robert has made it clear that this is not a priority, I need to figure out a way to get what I want without bothering you guys. That would mean writing a script that generates the table I'd like based on the latest repos data instead of downloading it from the website.

Right now, I shuffle together a useful languoid table (example here) based on a few Glottolog download tables and a few other files (my own oceania subgrouping, WALS genus and a MED table Harald sent over). A better version does this by calling pyglottolog instead of using the download files.

I hope this clarifies things :).

I have been using the newick command, hence #101 . It doesn't output what I thought it would (final semi-colon missing), but other than that it works fine ^^! No troubles there (anymore).

xrotwang commented 5 years ago

@HedvigS since you mention https://cdstar.shh.mpg.de/bitstreams/EAEA0-E7DE-FA06-8817-0/glottolog_languoid.csv.zip above: The reason this will not be changed is that it's the download tied to a particular possibly buggy, but released version of Glottolog. So you must not treat this URL as always pointing to up-to-date download data.

xrotwang commented 5 years ago

@HedvigS See http://cdstar.shh.mpg.de/landing/EAEA0-E7DE-FA06-8817-0 for the metadata of this download file.

xflr6 commented 5 years ago

@HedvigS, not sure I understand, but treedb.py (see above) does create a table that contains all (and more) columns from Glottolog in your example table (except for the counts) from the repo (i.e. the latest data), including endangerment_status. HTH.

HedvigS commented 5 years ago

@xrotwang yup, I'm not treating it as most up to date. I was just wondering if this particular change was going to filter through to there. I understand that it won't.

@xflr6 right, okay. I misunderstood then. I thought that treedb.py would just generate the genealogical information. That might be helpful then, thanks!