cldf / cookbook

Recipes for cooking with CLDF data
https://cldf.clld.org
Apache License 2.0
17 stars 7 forks source link

Convert CLDF Wordlist with cognates to old LingPy's QLC format #5

Open xrotwang opened 6 years ago

xrotwang commented 6 years ago

The LingPy tutorial uses LingPy's old QLC format (see polynesian.tsv). We should have a recipe to convert a CLDF Wordlist into this format. Should be a csvkit one-liner.

LinguList commented 6 years ago

From lingpy, it is:

>>> from lingpy.convert.cldf import from_cldf
>>> from_cldf('path').output('tsv', filename='filename', prettify=False)
xrotwang commented 6 years ago

Yes, this would just be a "proof-of-concept" recipe, or for providing backward compatibility with earlier LingPy versions.

LinguList commented 6 years ago

BTW: it's also what @thiagochacon wanted, namely that we help convert data to "edictor" format.

Anaphory commented 5 years ago

If you want support for non-standard CLDF column headers, it is

>>> from lingpy import Wordlist
>>> Wordlist.from_cldf('path').output('tsv', filename='filename', prettify=False)

although that keeps the non-standard column headers and does not yet change them into the standard DOCULECT CONCEPT IPA headers that Edictor expects.

LinguList commented 5 years ago

you can easily find a workaround:

wl = wordlist.from_cldf('path.json')
wl.add_entries('doculect', 'language_name', lambda x: x)
wl.add_entries('concept', 'concept_name', lambda x: x)
wl.add_entries('tokens', 'segments', lambda x: x)
wl.output('tsv', filename='bla', prettify=False, subset=True, cols=['doculect', 'concept', 'tokens'])

This is okay enough for the time being, I'd say.