cldf-clts / clts

Cross-Linguistic Transcription Systems
https://clts.clld.org
14 stars 3 forks source link

[New Transcription Data] IpaPy #29

Open LinguList opened 4 years ago

LinguList commented 4 years ago

ipapy offers some feature representations of some ipa characters / letters. The problem is that they need to be extracted somehow, it is not entirely clear. But I think it would be nice to list for each CLTS BIPA character if it would be accepted by IPAPY and also how it would be encoded in terms of features there.

tresoldi commented 4 years ago

From the code, data seems to come entirely from here: https://github.com/pettarin/ipapy/blob/master/ipapy/data/ipa.dat

It is CSV structure, but the sound names are internally tab-separated. It could be mapped to CLTS BIPA by using the grapheme (which needs to be normalized), the name descriptor, or both. Some manual refinement/checking is also necessary.

There is also an arpabet and a kirshenbaum resource in the same directory.

LinguList commented 3 years ago

So how feasible is it to make a table and add it to our sources?

tresoldi commented 3 years ago

Does not look too complex, it is more a question on how reproducible it should be (i.e., should there be a nice script to download data, parse it, etc.?), whether to include entries that are commented out, and similar decision.

However, I cannot see any mention to a peer-reviewed publication. Didn't we decide to add only either peer-reviewed work or resources very much established in the community?

LinguList commented 3 years ago

Before we discuss this for too long, let's forget it for now and concentrate on those datasts which I added (see milestone 1.3). These all should be mapped, using the new pyclts approach, and manually corrected.