Guarantee ASJP coverage is complete

tresoldi commented 4 years ago

In line with the transcription in Lexibank ASJP, particularly the Transcription.md and orthographic profile as discussed in https://github.com/lexibank/asjp/pull/4

LinguList commented 3 years ago

Okay, @tresoldi, if you can go ahead, providing this as a transcription dataset, this would be nice. The idea is: use a parse.py file where you iterate over all of asjp, extract the sound, and also count it. So we have also a nice quantitative dataset showing us the global frequency, right?

tresoldi commented 3 years ago

Two questions here:

Should we follow only the published ASJP code, or extend it with the actual ASJP data? I ask because in the profile for the Lexibank ASJP repository I had to account for some combinations that were not described in the paper (perhaps because they were added later), which I had to investigate in the original sources to confirm. It is a handful of cases, but it would be an extension nonetheless.
The current asjpcode has some changes from the paper, namely the 3, 4, and 5 classes that were renamed to letters, so not to conflict with tones. I should keep these minor changes, correct?

cldf-clts / clts

Guarantee ASJP coverage is complete #15