Closed tresoldi closed 3 years ago
Okay, @tresoldi, if you can go ahead, providing this as a transcription dataset, this would be nice. The idea is: use a parse.py file where you iterate over all of asjp, extract the sound, and also count it. So we have also a nice quantitative dataset showing us the global frequency, right?
Two questions here:
Should we follow only the published ASJP code, or extend it with the actual ASJP data? I ask because in the profile for the Lexibank ASJP repository I had to account for some combinations that were not described in the paper (perhaps because they were added later), which I had to investigate in the original sources to confirm. It is a handful of cases, but it would be an extension nonetheless.
The current asjpcode has some changes from the paper, namely the 3
, 4
, and 5
classes that were renamed to letters, so not to conflict with tones. I should keep these minor changes, correct?
In line with the transcription in Lexibank ASJP, particularly the Transcription.md and orthographic profile as discussed in https://github.com/lexibank/asjp/pull/4