Closed LinguList closed 6 years ago
sound_class | feature | value | diacritic |
---|---|---|---|
consonant | articulation | strong | ◌͈ |
consonant | aspiration | aspirated | ◌ʰ |
consonant | aspiration | aspirated.sibilancy:sibilant | |
consonant | breathiness | breathy | ◌ʱ |
consonant | creakiness | creaky | ◌̰ |
consonant | duration | long | ◌ː |
consonant | ejection | ejective | ◌’ |
consonant | glottalization | glottalized | ◌ˀ |
consonant | labialization | labialized | ◌ʷ |
consonant | laminality | apical | ◌̺ |
consonant | laminality | laminal | ◌̻ |
consonant | laterality | lateral | |
consonant | manner | affricate | |
consonant | manner | approximant | |
consonant | manner | click | |
consonant | manner | fricative | |
consonant | manner | implosive | |
consonant | manner | nasal | |
consonant | manner | stop | |
consonant | manner | tap | |
consonant | manner | trill | |
consonant | nasalization | nasalized | ◌̃ |
consonant | palatalization | labio-palatalized | ◌ᶣ |
consonant | palatalization | palatalized | ◌ʲ |
consonant | pharyngealization | pharyngealized | ◌ˤ |
consonant | phonation | voiced | |
consonant | phonation | voiceless | |
consonant | place | alveolar | |
consonant | place | alveolo-palatal | |
consonant | place | bilabial | |
consonant | place | dental | |
consonant | place | epiglottal | |
consonant | place | glottal | |
consonant | place | labial | |
consonant | place | labialized-palatal | |
consonant | place | labialized-velar | |
consonant | place | labio-dental | |
consonant | place | palatal | |
consonant | place | palatal-velar | |
consonant | place | pharyngeal | |
consonant | place | post-alveolar | |
consonant | place | retroflex | |
consonant | place | uvular | |
consonant | place | velar | |
consonant | preceding | postoralized | |
consonant | preceding | pre-aspirated | ʰ◌ |
consonant | preceding | pre-glottalized | ˀ◌ |
consonant | preceding | pre-labialized | ʷ◌ |
consonant | preceding | pre-nasalized | ⁿ◌ |
consonant | preceding | pre-palatalized | ʲ◌ |
consonant | release | unreleased | ◌̚ |
consonant | release | with-lateral-release | ◌ˡ |
consonant | release | with-mid-central-vowel-release | ◌ᵊ |
consonant | release | with-nasal-release | ◌ⁿ |
consonant | sibilancy | sibilant | |
consonant | stress | primary-stress | ˈ◌ |
consonant | stress | secondary-stress | ˌ◌ |
consonant | syllabicity | syllabic | ◌̩ |
consonant | velarization | velarized | ◌ˠ |
consonant | voicing | devoiced | |
consonant | voicing | revoiced | ◌̬ |
vowel | advancement | advanced | ◌̟ |
vowel | articulation | strong | ◌͈ |
vowel | breathiness | breathy | ◌̤ |
vowel | centrality | back | |
vowel | centrality | central | |
vowel | centrality | centralized | ◌̈ |
vowel | centrality | front | |
vowel | centrality | mid-centralized | ◌̽ |
vowel | centrality | near-back | |
vowel | centrality | near-front | |
vowel | creakiness | creaky | ◌̰ |
vowel | duration | long | ◌ː |
vowel | duration | mid-long | ◌ˑ |
vowel | duration | ultra-long | |
vowel | duration | ultra-short | ◌̆ |
vowel | frication | with-frication | |
vowel | glottalization | glottalized | ◌ˀ |
vowel | height | close | |
vowel | height | close-mid | |
vowel | height | mid | |
vowel | height | near-close | |
vowel | height | near-open | |
vowel | height | nearly-open | |
vowel | height | open | |
vowel | height | open-mid | |
vowel | nasalization | nasalized | ◌̃ |
vowel | pharyngealization | pharyngealized | ◌ˤ |
vowel | raising | lowered | ◌̞ |
vowel | raising | raised | ◌̝ |
vowel | retraction | retracted | ◌̠ |
vowel | rhotacization | rhotacized | ◌˞ |
vowel | roundedness | rounded | ◌̹ |
vowel | roundedness | unrounded | ◌̜ |
vowel | rounding | less-rounded | ◌̜ |
vowel | rounding | more-rounded | ◌̹ |
vowel | stress | primary-stress | ˈ◌ |
vowel | stress | secondary-stress | ˌ◌ |
vowel | syllabicity | non-syllabic | ◌̯ |
vowel | tone | with_downstep | ◌↓ |
vowel | tone | with_extra-high_tone | ◌̋ |
vowel | tone | with_extra_low_tone | ◌̏ |
vowel | tone | with_falling_tone | ◌̂ |
vowel | tone | with_global_fall | ◌↘ |
vowel | tone | with_global_rise | ◌↗ |
vowel | tone | with_high_tone | ◌́ |
vowel | tone | with_low_tone | ◌̀ |
vowel | tone | with_mid_tone | ◌̄ |
vowel | tone | with_rising_tone | ◌̌ |
vowel | tone | with_upstep | ◌↑ |
vowel | tongue_root | advanced-tongue-root | ◌̘ |
vowel | tongue_root | retracted-tongue-root | ◌̙ |
vowel | velarization | velarized | ◌ˠ |
vowel | voicing | devoiced | ◌̥ |
Here's the same thing in spreadsheet:
I think, with this, we can have a fruitful discussion about what should be changed, etc., and I consider also using this to afterwards automatically check whether our data is correct.
Here's the code to extract the features:
In [39]: from pyclts import *
In [40]: bipa = TranscriptionSystem()
In [41]: table = []
In [42]: for k, v in bipa._features['consonant'].items():
...: row = ['consonant', bipa._feature_values[k], k, v]
...: table += [row]
...:
...:
In [43]: for k, v in bipa._features['vowel'].items():
...: row = ['vowel', bipa._feature_values[k], k, v]
...: table += [row]
...:
...:
...:
...:
In [44]: for s in bipa._sounds:
...: if not bipa[s].type == 'marker':
...: for f in bipa[s]._features():
...: if not bipa[s].type in [ 'tone', 'marker']:
...: table += [[bipa[s].type, bipa._feature_values[f], getattr(bipa[s], bipa._feature_values[f]), bipa._features[bipa[s].type].get(f, '')]]
...:
In [45]: table = sorted(set([tuple(x) for x in table if not None in x]))
In [46]: table = [['sound_class', 'feature', 'value', 'diacritic']] + table
In [47]: with open('features.tsv', 'w') as f:
...: for line in table:
...: f.write('\t'.join(line)+'\n')
...:
sorry, the file is not good, use this excel file here, if you want to have a closer look at the features:
It seems ok, I might have worked some things in different way but it is clearly a matter of preference (in fact, your system does look more neutral than what pops in my mind).
One thing I'm not sure if I follow is the treatment of non-pulmonic consonants. Nasal clicks, for example, would be defined as "nasalized clicks"? If so, it seems inconsistent with pulmonic consonants, where you have "stop" and "nasal" as different manners.
Good point, I just assumed naively that I could label them as nasalized, but when looking back at this chart, which @afehn recommended:
nakagawa-2013-khoisan-phonotactics.pdf
(by Nakagawa 2013), I see that this is some nasal cluster. This is easy to handle, we can just discard the extra-symbols, and allow that the clusters can consist of "nasal+click" (that's the beauty of the generative system). I'll make an issue.
Two more observations:
breathiness
and creakiness
, when you have a phonation
with values "voiced" and "voiceless"? I get the voicing
feature, as it describes a relation to an "archiphoneme", but maybe they could be part of phonation
.This is easy to answer: because we want to model the grapheme system of IPA, and be able to parse the sounds.
Since linguists make too much mess with their freedom and create too many inconsistent data, this allows us at least to describe what they annotate.
Consider this:
In [1]: from pyclts import *
In [2]: bipa = TranscriptionSystem()
In [5]: bipa['breathy voiceless bilabial stop consonant'].s
Out[5]: 'pʱ'
In fact, there are cases of Hmong-Mien languages where grammars insist that there is some unvoiced sound which has a breathy release. If we insist on saying that phonation has "voiced", "unvoiced", "breathy-voiced", and "creaky-voiced", we won't be able to capture these differences, and we'll have to spell out (since the algo can't overwrite features, which is a design principle), for each base-sound + creaky/breathy combination, that this sound is existent and legitimate.
Furthermore, consider the name:
In [7]: bipa['breathy voiced bilabial stop consonant'].name
Out[7]: 'breathy voiced bilabial stop consonant'
This comes close to the traditional notion of "breath-voiced bilabial stop", without the dash. We can easily find all cases of breathiness, etc., by making set comparison. In fact, in order to break down those things to ASJP sound classes, where breathiness is switched off, we can parse so far unknown sounds by their base features and reduce them to the correct symbol:
In [8]: asjp = TranscriptionData('asjp')
In [9]: asjp['breathy voiceless bilabial stop consonant']
Out[9]: 'p'
So recall, that this is a practical approach, which does not really care how economic, pleasant-to-the-eye, or reasonable a feature system is. Instead, this approach attempts to be able to infer from the graphemes fed to the algo to render what people write. This is the first step towards comparability, if we wanted to teach people how to do phonetic transcriptions, or to impose a feature system that we think is better than all the rest, we would use something else, but "bipa" starts from the symbols and tries to translate them literally into the features invoked by the system. All additional labor can be done later. "bipa" corrects errors via normalization by resolving lookalikes and the alias system (breathiness can also be expressed by the diacritic dots beyond, so we choose one version here, to normalise), but it does not care whether the sounds people propose are possible or meaningful. Based on my practice with language data, I consider this the only way to proceed. The first step of rendering things comparable. But a considerable step. We first need to be able to handle the data, once we are in such a stage, we can think of bringing it to some use.
Thank you for the explanation, it makes total sense now! I'd even suggest you incorporate parts of it in a general "What is the idea behind CLTS" documentation.
Funny thing is that my own system is now clearer to me: as a "laboratory experiment", it is much more "essentialist" in its approach to phonology (if you really must use common names, it is closer to acoustics), and maybe it could be used in tandem in certain experiments (if it proves usable, that is).
Thanks for understanding. I think it is important our discussion here, to keep this in mind: before we can make our own feature systems or encode data in the systems based on other people's feature system, we must be able to handle as much diverse data as possible. We can't do this by superimposing our favorite feature system without looking into the compositionality of graphemes, as this would mean that we have to code a huge amount of sounds manually, adding features, even before we start to compare whether these are reflected in our data or not. CLTS-BIPA, on the other hand, is able to give names to sounds, to correct obvious unicode errors by providing normalization, and despite it's currently rather small base inventory of pre-defined sounds and features, it can generate already now a huge amount of data.
So everybody is invited to provide their feature system, like Ladefoged or the system by @tresoldi, by providing a list, as indicated, with many different concrete sounds segments, and your feature specifications, and we can then see how to automatically link it via CLTS, and if this is successful, we can add it as transcription-data to the database.
To allow for an easy and constant re-generation of the feature system, I'll add the script features.py to the cookbook-collection and the feature system will be presented as json in the transcriptionssystems/.