Standardize the tables with help of the w3 tsv scheme of clldutils

LinguList commented 7 years ago

I think it is straightforward to standardize our csv files. Will also save some extra-documentation in human-readable form:

separator is \t
columns (for consonants, need to handle this for all types) are: GRAPHEME, PHONATION, PLACE, MANNER, ALIAS, EXTRA, NOTE
ALIAS is either nothing or +
EXTRA is double-segmented: first split by ,, then what's left is key:value constructs

Thanks to the CSV-parser of clldutils, we can even directly validate, I suppose? E.g., preventing wrong categories from being written down (like spelling errors, or categories not handled [yet]).

Furthermore, by adding some information on what that all means (manner, place, etc.), we could easily expand the ontology. I wonder if it'll be possible to involve a colleague on helping with this: @cormacanderson, for example, could write quick statements for the ontology and compare it with GOLD. For a trained phonetician, this may be much simpler than for us.

xrotwang commented 7 years ago

Ok, I'll have a stab at this. So I guess what we need here is a metadata file which describes all the tables in one of the subdirectories of clts/data, right?

xrotwang commented 7 years ago

Should we go for turning this (i.e. the specification of a transcription system) into a CLDF module right away? Or is this a lot more experimental than the other CLDF modules?

LinguList commented 7 years ago

Yes, the tables in clts/data, for the time being assuming that "bipa" is our default, and "asjp" is an example on how one could expand it to other systems, maybe even transcriptions in cyrillic or the like, if needed at some point.

As to the second question, of turning this into a cldf-module: basically, I'd agree, but I am worried that some aspects of the code are too experimental at this stage, and I even do not know whether the way I propagate the handling of transcription systems is universal enough to be applicable to other transcription systems. So I'd still wait a little bit with this until I gave that talk at the Poznan meeting, where I hope to get some feedback while at the same time being forced to push it a little bit more, also with experiments.

One may also generally think of this as a standalone such as concepticon, given that it might be drastically expanded with meta-data in the nearer future.

cldf-clts / clts-legacy

Standardize the tables with help of the w3 tsv scheme of clldutils #10