LoanDB / ronataswestoldturkic

CLDF dataset derived from 'West Old Turkic' by András Róna-Tas and Árpád Berta from 2011
https://www.harrassowitz-verlag.de/title_4002.ahtml
Creative Commons Attribution 4.0 International
0 stars 0 forks source link

how to find BIPA errors #2

Closed martino-vic closed 2 years ago

martino-vic commented 2 years ago

There seems to be a small BIPA error hidden, but I can't find out which one. If I make a set of all the IPA-characters used, this comes out: ['', 'a', 'aː', 'b', 'c', 'd', 'd͡z', 'd͡ʒ', 'e', 'eː', 'f', 'h', 'i', 'iː', 'j', 'k', 'l', 'm', 'n', 'o', 'oː', 'p', 'r', 's', 't', 't͡s', 't͡ʃ', 'u', 'uː', 'v', 'w', 'y', 'yː', 'z', 'ø', 'øː', 'ŋ', 'ɐ', 'ɒ', 'ɛ', 'ɟ', 'ɡ', 'ɣ', 'ɥ', 'ɯ', 'ɲ', 'ʃ', 'ʎ', 'δ', 'χ']. I checked for every element if it's in the master list and it says yes. Which other strategies would there exist to pin down bad characters?

martino-vic commented 2 years ago

The bad character was 'δ'. I had checked against the wrong master list, should have checked here

LinguList commented 2 years ago

cldfbench lexibank.check_profile lexibank_FILE.py (FILE = your dataset ID). This is my preferred command for tracing errors. Then, there is also lexibank.check_phonotactics, which I recommend to search for morpheme boundaries displaced.