current feature system - Githubissues

LinguList commented 6 years ago

sound_class	feature	value	diacritic
consonant	articulation	strong	◌͈
consonant	aspiration	aspirated	◌ʰ
consonant	aspiration	aspirated.sibilancy:sibilant
consonant	breathiness	breathy	◌ʱ
consonant	creakiness	creaky	◌̰
consonant	duration	long	◌ː
consonant	ejection	ejective	◌’
consonant	glottalization	glottalized	◌ˀ
consonant	labialization	labialized	◌ʷ
consonant	laminality	apical	◌̺
consonant	laminality	laminal	◌̻
consonant	laterality	lateral
consonant	manner	affricate
consonant	manner	approximant
consonant	manner	click
consonant	manner	fricative
consonant	manner	implosive
consonant	manner	nasal
consonant	manner	stop
consonant	manner	tap
consonant	manner	trill
consonant	nasalization	nasalized	◌̃
consonant	palatalization	labio-palatalized	◌ᶣ
consonant	palatalization	palatalized	◌ʲ
consonant	pharyngealization	pharyngealized	◌ˤ
consonant	phonation	voiced
consonant	phonation	voiceless
consonant	place	alveolar
consonant	place	alveolo-palatal
consonant	place	bilabial
consonant	place	dental
consonant	place	epiglottal
consonant	place	glottal
consonant	place	labial
consonant	place	labialized-palatal
consonant	place	labialized-velar
consonant	place	labio-dental
consonant	place	palatal
consonant	place	palatal-velar
consonant	place	pharyngeal
consonant	place	post-alveolar
consonant	place	retroflex
consonant	place	uvular
consonant	place	velar
consonant	preceding	postoralized
consonant	preceding	pre-aspirated	ʰ◌
consonant	preceding	pre-glottalized	ˀ◌
consonant	preceding	pre-labialized	ʷ◌
consonant	preceding	pre-nasalized	ⁿ◌
consonant	preceding	pre-palatalized	ʲ◌
consonant	release	unreleased	◌̚
consonant	release	with-lateral-release	◌ˡ
consonant	release	with-mid-central-vowel-release	◌ᵊ
consonant	release	with-nasal-release	◌ⁿ
consonant	sibilancy	sibilant
consonant	stress	primary-stress	ˈ◌
consonant	stress	secondary-stress	ˌ◌
consonant	syllabicity	syllabic	◌̩
consonant	velarization	velarized	◌ˠ
consonant	voicing	devoiced
consonant	voicing	revoiced	◌̬
vowel	advancement	advanced	◌̟
vowel	articulation	strong	◌͈
vowel	breathiness	breathy	◌̤
vowel	centrality	back
vowel	centrality	central
vowel	centrality	centralized	◌̈
vowel	centrality	front
vowel	centrality	mid-centralized	◌̽
vowel	centrality	near-back
vowel	centrality	near-front
vowel	creakiness	creaky	◌̰
vowel	duration	long	◌ː
vowel	duration	mid-long	◌ˑ
vowel	duration	ultra-long
vowel	duration	ultra-short	◌̆
vowel	frication	with-frication
vowel	glottalization	glottalized	◌ˀ
vowel	height	close
vowel	height	close-mid
vowel	height	mid
vowel	height	near-close
vowel	height	near-open
vowel	height	nearly-open
vowel	height	open
vowel	height	open-mid
vowel	nasalization	nasalized	◌̃
vowel	pharyngealization	pharyngealized	◌ˤ
vowel	raising	lowered	◌̞
vowel	raising	raised	◌̝
vowel	retraction	retracted	◌̠
vowel	rhotacization	rhotacized	◌˞
vowel	roundedness	rounded	◌̹
vowel	roundedness	unrounded	◌̜
vowel	rounding	less-rounded	◌̜
vowel	rounding	more-rounded	◌̹
vowel	stress	primary-stress	ˈ◌
vowel	stress	secondary-stress	ˌ◌
vowel	syllabicity	non-syllabic	◌̯
vowel	tone	with_downstep	◌↓
vowel	tone	with_extra-high_tone	◌̋
vowel	tone	with_extra_low_tone	◌̏
vowel	tone	with_falling_tone	◌̂
vowel	tone	with_global_fall	◌↘
vowel	tone	with_global_rise	◌↗
vowel	tone	with_high_tone	◌́
vowel	tone	with_low_tone	◌̀
vowel	tone	with_mid_tone	◌̄
vowel	tone	with_rising_tone	◌̌
vowel	tone	with_upstep	◌↑
vowel	tongue_root	advanced-tongue-root	◌̘
vowel	tongue_root	retracted-tongue-root	◌̙
vowel	velarization	velarized	◌ˠ
vowel	voicing	devoiced	◌̥

LinguList commented 6 years ago

Here's the same thing in spreadsheet:

features.tsv.txt

LinguList commented 6 years ago

I think, with this, we can have a fruitful discussion about what should be changed, etc., and I consider also using this to afterwards automatically check whether our data is correct.

LinguList commented 6 years ago

Here's the code to extract the features:

In [39]: from pyclts import *

In [40]: bipa = TranscriptionSystem()

In [41]: table = []

In [42]: for k, v in bipa._features['consonant'].items():
    ...:     row = ['consonant', bipa._feature_values[k], k, v]
    ...:     table += [row]
    ...:     
    ...:     

In [43]: for k, v in bipa._features['vowel'].items():
    ...:     row = ['vowel', bipa._feature_values[k], k, v]
    ...:     table += [row]
    ...:     
    ...:     
    ...:     
    ...:     

In [44]: for s in bipa._sounds:
    ...:     if not bipa[s].type == 'marker':
    ...:         for f in bipa[s]._features():
    ...:             if not bipa[s].type in [ 'tone', 'marker']:
    ...:                 table += [[bipa[s].type, bipa._feature_values[f], getattr(bipa[s], bipa._feature_values[f]), bipa._features[bipa[s].type].get(f, '')]]
    ...:         

In [45]: table = sorted(set([tuple(x) for x in table if not None in x]))

In [46]: table = [['sound_class', 'feature', 'value', 'diacritic']] + table

In [47]: with open('features.tsv', 'w') as f:
    ...:     for line in table:
    ...:         f.write('\t'.join(line)+'\n')
    ...:

LinguList commented 6 years ago

sorry, the file is not good, use this excel file here, if you want to have a closer look at the features:

features.xlsx

tresoldi commented 6 years ago

It seems ok, I might have worked some things in different way but it is clearly a matter of preference (in fact, your system does look more neutral than what pops in my mind).

One thing I'm not sure if I follow is the treatment of non-pulmonic consonants. Nasal clicks, for example, would be defined as "nasalized clicks"? If so, it seems inconsistent with pulmonic consonants, where you have "stop" and "nasal" as different manners.

LinguList commented 6 years ago

Good point, I just assumed naively that I could label them as nasalized, but when looking back at this chart, which @afehn recommended:

nakagawa-2013-khoisan-phonotactics.pdf

(by Nakagawa 2013), I see that this is some nasal cluster. This is easy to handle, we can just discard the extra-symbols, and allow that the clusters can consist of "nasal+click" (that's the beauty of the generative system). I'll make an issue.

tresoldi commented 6 years ago

Two more observations:

why features such as breathiness and creakiness, when you have a phonation with values "voiced" and "voiceless"? I get the voicing feature, as it describes a relation to an "archiphoneme", but maybe they could be part of phonation.
how would complex tone contours be mapped? (things like a˦˨˧˨ , i.e., a4232)

LinguList commented 6 years ago

This is easy to answer: because we want to model the grapheme system of IPA, and be able to parse the sounds.

Since linguists make too much mess with their freedom and create too many inconsistent data, this allows us at least to describe what they annotate.

Consider this:

In [1]: from pyclts import *

In [2]: bipa = TranscriptionSystem()

In [5]: bipa['breathy voiceless bilabial stop consonant'].s
Out[5]: 'pʱ'

In fact, there are cases of Hmong-Mien languages where grammars insist that there is some unvoiced sound which has a breathy release. If we insist on saying that phonation has "voiced", "unvoiced", "breathy-voiced", and "creaky-voiced", we won't be able to capture these differences, and we'll have to spell out (since the algo can't overwrite features, which is a design principle), for each base-sound + creaky/breathy combination, that this sound is existent and legitimate.

Furthermore, consider the name:

In [7]: bipa['breathy voiced bilabial stop consonant'].name
Out[7]: 'breathy voiced bilabial stop consonant'

This comes close to the traditional notion of "breath-voiced bilabial stop", without the dash. We can easily find all cases of breathiness, etc., by making set comparison. In fact, in order to break down those things to ASJP sound classes, where breathiness is switched off, we can parse so far unknown sounds by their base features and reduce them to the correct symbol:

In [8]: asjp = TranscriptionData('asjp')

In [9]: asjp['breathy voiceless bilabial stop consonant']
Out[9]: 'p'

So recall, that this is a practical approach, which does not really care how economic, pleasant-to-the-eye, or reasonable a feature system is. Instead, this approach attempts to be able to infer from the graphemes fed to the algo to render what people write. This is the first step towards comparability, if we wanted to teach people how to do phonetic transcriptions, or to impose a feature system that we think is better than all the rest, we would use something else, but "bipa" starts from the symbols and tries to translate them literally into the features invoked by the system. All additional labor can be done later. "bipa" corrects errors via normalization by resolving lookalikes and the alias system (breathiness can also be expressed by the diacritic dots beyond, so we choose one version here, to normalise), but it does not care whether the sounds people propose are possible or meaningful. Based on my practice with language data, I consider this the only way to proceed. The first step of rendering things comparable. But a considerable step. We first need to be able to handle the data, once we are in such a stage, we can think of bringing it to some use.

tresoldi commented 6 years ago

Thank you for the explanation, it makes total sense now! I'd even suggest you incorporate parts of it in a general "What is the idea behind CLTS" documentation.

Funny thing is that my own system is now clearer to me: as a "laboratory experiment", it is much more "essentialist" in its approach to phonology (if you really must use common names, it is closer to acoustics), and maybe it could be used in tandem in certain experiments (if it proves usable, that is).

LinguList commented 6 years ago

Thanks for understanding. I think it is important our discussion here, to keep this in mind: before we can make our own feature systems or encode data in the systems based on other people's feature system, we must be able to handle as much diverse data as possible. We can't do this by superimposing our favorite feature system without looking into the compositionality of graphemes, as this would mean that we have to code a huge amount of sounds manually, adding features, even before we start to compare whether these are reflected in our data or not. CLTS-BIPA, on the other hand, is able to give names to sounds, to correct obvious unicode errors by providing normalization, and despite it's currently rather small base inventory of pre-defined sounds and features, it can generate already now a huge amount of data.

So everybody is invited to provide their feature system, like Ladefoged or the system by @tresoldi, by providing a list, as indicated, with many different concrete sounds segments, and your feature specifications, and we can then see how to automatically link it via CLTS, and if this is successful, we can add it as transcription-data to the database.

LinguList commented 6 years ago

To allow for an easy and constant re-generation of the feature system, I'll add the script features.py to the cookbook-collection and the feature system will be presented as json in the transcriptionssystems/.

cldf-clts / clts-legacy

current feature system #66