Open xrotwang opened 6 years ago
From lingpy, it is:
>>> from lingpy.convert.cldf import from_cldf
>>> from_cldf('path').output('tsv', filename='filename', prettify=False)
Yes, this would just be a "proof-of-concept" recipe, or for providing backward compatibility with earlier LingPy versions.
BTW: it's also what @thiagochacon wanted, namely that we help convert data to "edictor" format.
If you want support for non-standard CLDF column headers, it is
>>> from lingpy import Wordlist
>>> Wordlist.from_cldf('path').output('tsv', filename='filename', prettify=False)
although that keeps the non-standard column headers and does not yet change them into the standard DOCULECT CONCEPT IPA
headers that Edictor expects.
you can easily find a workaround:
wl = wordlist.from_cldf('path.json')
wl.add_entries('doculect', 'language_name', lambda x: x)
wl.add_entries('concept', 'concept_name', lambda x: x)
wl.add_entries('tokens', 'segments', lambda x: x)
wl.output('tsv', filename='bla', prettify=False, subset=True, cols=['doculect', 'concept', 'tokens'])
This is okay enough for the time being, I'd say.
The LingPy tutorial uses LingPy's old QLC format (see polynesian.tsv). We should have a recipe to convert a CLDF Wordlist into this format. Should be a csvkit one-liner.