Closed LinguList closed 7 years ago
Yeah, it comes down to a documentation problem. E.g. this repository isn't the correct one :) So, this issue should be an issue in https://github.com/clld/glottolog and given how short the required code is, it may actually be appropriate in the FAQ.
Actually, in a python script it would probably make more sense to use TreeMaker
programmatically, i.e. construct a tree calling TreeMaker.add
and then writing it to file.
Yes, I agree, I was just too lazy to read how treemaker actually works...
What I came up with now is the following script:7
from __future__ import print_function
from pyglottolog.api import Glottolog
from treemaker import TreeMaker
from newick import loads
def tree(*taxa):
# We create a dict to lookup Glottolog languoids by name, ISO- or Glottocode.
langs = {}
for lang in Glottolog().languoids():
if lang.iso:
langs[lang.iso] = lang
langs[lang.name] = lang
langs[lang.id] = lang
t = TreeMaker()
for taxon in taxa:
if taxon not in langs:
print('unknown taxon: {0}'.format(taxon))
continue
t.add(taxon, ', '.join(l[1] for l in langs[taxon].lineage))
return t
if __name__ == '__main__':
import sys
print(loads(tree(*sys.argv[1:]).write())[0].ascii_art())
which works as follows:
$ python tree.py deu eng Welsh Pali scot1243
┌─Welsh
│ ┌─deu
├──────────┤
───────────┤ │ ┌─eng
│ └──────────┤
│ └─scot1243
└─Pali
This is what we want, right?
Having reflected, and given that TreeMaker is an easy way to create trees from hierarchies, the easiest way with the API would be to just create the required input format. This could be done like this, with a list of glottocodes being called "mycodes" for convenience:
Then, once the file is created, this can be easily converted with @simongreenhill's TreeMaker tool (linked above):
The output is the newick file
languages.nwk
.Given that many potential users are still not really aware of the power of the glottolog api, it seems like a good idea to start a little cookbook in the github repo, where things like this example (and others) are discussed and illustrated.