Closed LinguList closed 6 years ago
Quting here from what @cormacanderson said:
The following, I would consider unnecessary and would like to see data that warrant them: 1) "release": { "consonant": [ "with-mid-central-vowel-release", "with-voiceless-velar-fricative", "with-dental-fricative-release", ] The second two are likely to be affricated stops, i.e. [tᶿ] and [kˣ]. We could reword this as "with-fricative-release" or "with-affricate-release", or even put them under "aspriation" instead (we could also think of adding aspiration here, i.e. "with-aspiration". After all it is a release feature, and we could lose that feature altogether there. The "with-mid-central-vowel-release" could be renamed "with-schwa-release" or "with-vocalic-release". It's not entirely clear to me if that is even necessary, but we can see. 2) "palatalization": { "consonant": [ "labio-palatalized", "palatalized" ] We might ask why we can't just deal with this as "palatalization:palatalized" and "labialization:labialized" together. I presume each segment cannot be specified more than once for the same feature, otherwise we could just do "sec_localization" (or "sec_place", but I disprefer this terminology, also "place"), and have "palatalized", "labialized", etc. 3) }, "place": { "consonant": [ "labialized-velar", "lateral", "palatal-velar", "labialized-palatal", "labiovelar", ] I disprefer "place" over "localization", but anyway. I presume "labialized-velar" should be just "velar" and "labialization:labialized". As for "labialized-palatal" I don't see any reason for not just having that as "palatal" plus "labialization:labialized:. There is probably a case for retaining "labiovelar", given the cross-linguistic frequency of [w], [kp] etc. (note, "labiovelar" ≠ "labialized-velar"). The feature "palatal-velar" is questionable – is it "velar" plus "palatalization:palatalized" as in Irish or Russian, or what? I don't think "lateral" belongs here at all. It's not a place feature and was actually causing trouble with the consonants (see below). 4) "height": { "vowel": [ "mid", "close-mid", "near-open", "open-mid", "open", "near-close", "close", "nearly-open" For a start, "near-open" and "nearly-open" look like aliases. I would prefer using high-low here, personally: "high, near-high, mid-high, mid, mid-low, near-low, low". 5) "manner": { "consonant": [ "flap", "lateral-approximant", "stop-segment", "plosive", "nasal", "stop-cluster", "lateral-flap", "sibilant-fricative", "lateral-affricate", "sibilant-affricate", "lateral-fricative", "tap" ] This is a bit of a mess, and I don't think we are dealing with it in the best way. Numerous points: a) "tap" and "flap" are likely aliases – how do we distinguish? b) for "plosive" and "nasal" it might be more principled to call them "oral-stop" and "nasal-stop", c) can you clarify what "stop-segment" and "stop-cluster" actually mean – these look like structural features which should be taken out of here – they aren't manner, d) I would propose two new features: "laterality:lateral" and "sibilancy:sibilant" – that will mean we can make all of the above unary and make the description much more straightforward, e.g. currently we have [tθ] as "dental-affricate" and [ts] as "dental-sibilant-affricate", while this change would have the former as "dental-affricate" and the latter as "dental-affricate-sibilancy:sibilant". 6) }, "frication": { "vowel": [ "with-frication" Is it really necessary? Fang Lu and Feng Ling ("Fricative vowels as an intermediate state of vowel apicalization") say: "plain high vowels, fricative high vowels, and apical vowels distinguish in place of articulation, namely being anterodorsal, laminal, and apical respectively; and frication becomes a concomitant and redundant feature in the production of fricative or apical vowels".
I'll try and answer this later in detail.
This is a bit of a mess, and I don't think we are dealing with it in the best way. Numerous points: a) "tap" and "flap" are likely aliases – how do we distinguish? b) for "plosive" and "nasal" it might be more principled to call them "oral-stop" and "nasal-stop", c) can you clarify what "stop-segment" and "stop-cluster" actually mean – these look like structural features which should be taken out of here – they aren't manner, d) I would propose two new features: "laterality:lateral" and "sibilancy:sibilant" – that will mean we can make all of the above unary and make the description much more straightforward, e.g. currently we have [tθ] as "dental-affricate" and [ts] as "dental-sibilant-affricate", while this change would have the former as "dental-affricate" and the latter as "dental-affricate-sibilancy:sibilant".
First, tap and flap are defined on what people call them in the IPA chart, and I'd propose to just follow their protocol here.
Second, yes we could do that, but I'd suppose to follow the Wikipedia convention here, and call them "stop" vs. "nasal", as this will allow us to automatically link to wikipedia pages at times, and they usually closely reflect the tradition in linguistics.
Third, stop-segment etc. is for clicks, not for consonants, and @afehn should review these parts (as we play according to slightly different rules here, when dealing with clicks).
Forth, while the suggestion with the new features would make the system SEEM less complex, we would loose the connection to the graphemes, as graphemes minimally distinguish three features, and I would prefer to minimize the number of additional features for sounds that are represented by just one grapheme, as it will cause problems for the code base. I'll reflect a little bit and then make a proposal on how to proceed with this.
Addon: I'll leave it open whether we switch to "sibilant" as a feature, but I added "breathiness:breathy" and "creakiness:creaky", as people often insist that some sound has a breathy release, even if it's voiceless. This makes handling a lot easier.
Okay on points 2 and I suppose on 3 as well, although as these are for clicks should this not be a separate feature, i.e. stop-segment vs stop-cluster for clicks, in the same way that there are some features for vowels and some for consonants? After all, these manner features define "consonant" and I thought clicks were being dealt with separately. The latest IPA chart lists "tap or flap", so I think we are just duplicating here. As for 4, maybe we can discuss this back in Jena to come up with a principled way to proceed.
with latest changes, clicks are now just another consonant with click as manner. This has proven extremely useful, and also agreed by @afehn. The more complex stuff is treated as a cluster. This makes the system much more economic and strict, and no stop-clusters anymore.
please add points, @cormacanderson, if I forgot some.
An important question to @cormacanderson, following up the following point you made:
I would propose two new features: "laterality:lateral" and "sibilancy:sibilant" – that will mean we can make all of the above unary and make the description much more straightforward, e.g. currently we have [tθ] as "dental-affricate" and [ts] as "dental-sibilant-affricate", while this change would have the former as "dental-affricate" and the latter as "dental-affricate-sibilancy:sibilant".
Do sibilancy and laterality exclude each other, or do we have a lateral sibilant fricative? If they exclude each other, I'd be inclined to add a forth REQUIRED column of features to our consonants.tsv
, but I don't know how to call it, as this would make computation much, much more straightforward:
Grapheme | Phonation | Place | Manner-Addon | Manner |
---|---|---|---|---|
t | voiceless | alveolar | stop | |
s | voiceless | alveolar | sibilant | fricative |
ɮ | voiced | alveolar | lateral | fricative |
What do you think? Computationally, this would make a huge difference, even adding two columns (like laterality and sibilancy, but I think we don't need it, right?), so we'd just need a good name for this additional manner-aspect.
My two cents, as I had this on my system. I think that the features are by definition exclusive, as the hissing sound characteristic of sibilancy is produced by the open tube of the tongue groove directing the air to upper teeth/gums. This seems impossible to happen in any lateral configuration, as the laterality needs precisely for the tongue apex to divert the airflow coming from the back (no matter where the apex actually touches, i.e., teeth, gum, ridge, etc.). I can try to produce a hissing sound similar to sibilant fricatives with a "hint" of laterality if I produce and /s/ and move my apex to a cheek, but it is still essentially sibilant and we're far in the realm of speech disorder.
As for a feature name, I think something related to "tongue-shape" would be good, as it could also be used for things like sulcalized vowels (sorry, on mobile, can't check if it is already in CLTS). It could be something like "groove/non-grooved", something like "concave/convex" (don't know if this is used in the literature, wouldn't really want to innovate), or something more "scientific" and related to the intrinsic muscles that are used in each case (superior/inferior longitudinal, if a quick googling is indeed confirming my memory).
2017-12-18 5:07 GMT-02:00 Johann-Mattis List notifications@github.com:
An important question to @cormacanderson https://github.com/cormacanderson, following up the following point you made:
I would propose two new features: "laterality:lateral" and "sibilancy:sibilant" – that will mean we can make all of the above unary and make the description much more straightforward, e.g. currently we have [tθ] as "dental-affricate" and [ts] as "dental-sibilant-affricate", while this change would have the former as "dental-affricate" and the latter as "dental-affricate-sibilancy:sibilant".
Do sibilancy and laterality exclude each other, or do we have a lateral sibilant fricative? If they exclude each other, I'd be inclined to add a forth REQUIRED column of features to our consonants.tsv, but I don't know how to call it, as this would make computation much, much more straightforward: Grapheme Phonation Place Manner-Addon Manner t voiceless alveolar stop s voiceless alveolar sibilant fricative ɮ voiced alveolar lateral fricative
What do you think? Computationally, this would make a huge difference, even adding two columns (like laterality and sibilancy, but I think we don't need it, right?), so we'd just need a good name for this additional manner-aspect.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/issues/43#issuecomment-352342779, or mute the thread https://github.com/notifications/unsubscribe-auth/AAar90KwONyEiFJV_lYRaLASdYsYs3TRks5tBg9AgaJpZM4Q3fTs .
Actually, I had first "sulcalized" instead of "sibilant", although we don't particularly need to care about vowels now, as we strictly divide the features for vowels and consonants, so tongue-shape may be a good name. Given that I just remembered that ejectives cause the same problem for us, I am now actually inclined to even take five columns as the "base" ones, in order to include ejectives and to not have to be criticized for choosing some feature name that other people would then criticize if they have a different phonetic theory:
Grapheme | Phonation | Place | Laterality | Sibilancy | Ejection | Manner |
---|---|---|---|---|---|---|
t | voiceless | alveolar | stop | |||
s | voiceless | alveolar | sibilant | fricative | ||
ɮ | voiced | alveolar | lateral | fricative | ||
ɬ’ | voiced | alveolar | lateral | ejective | approximant |
This will make the table look more bulky, but it is still MUCH more convenient than writing Chomsky halle features into a table, and reading from left to right immediately invokes the common names in the order used by IPA.
"tongue-shape" would also allow to add any other exclusive tongue shapes, if necessary. It has my vote.
And the table does not seem that bulky (speaking from the experience of my own system) and it is indeed more convenient: you can query it like the database it is (I mean, without things like substring searching) and it can be mapped with just a couple lines of code to a one-hot encoding (such as what I suggested when we were discussing your tiers idea).
It is more usable than my "one-feature-to-rule-them-all" proposal, which I'll probably only keep as a thought experiment rather than something to be used (still think it is a good idea for dealing with consonant<->vowel rules, but that is not the point here).
2017-12-18 9:19 GMT-02:00 Johann-Mattis List notifications@github.com:
Actually, I had first "sulcalized" instead of "sibilant", although we don't particularly need to care about vowels now, as we strictly divide the features for vowels and consonants, so tongue-shape may be a good name. Given that I just remembered that ejectives cause the same problem for us, I am now actually inclined to even take five columns as the "base" ones, in order to include ejectives and to not have to be criticized for choosing some feature name that other people would then criticize if they have a different phonetic theory: Grapheme Phonation Place Laterality Sibilancy Ejection Manner t voiceless alveolar stop s voiceless alveolar sibilant fricative ɮ voiced alveolar lateral fricative ɬ’ voiced alveolar lateral ejective approximant
This will make the table look more bulky, but it is still MUCH more convenient than writing Chomsky halle features into a table, and reading from left to right immediately invokes the common names in the order used by IPA.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/issues/43#issuecomment-352398168, or mute the thread https://github.com/notifications/unsubscribe-auth/AAar9ykKzrfgt0Ju_JoxH4tJRr8VQhafks5tBkowgaJpZM4Q3fTs .
My original idea was to have three base features for each sound class and then to add the rest in the EXTRA table, but it turns out allowing a few more as the five now, will still make it easy enough to add and read new data. Before, in the CLPA project, we had all features in a column each, which turned out to be completely impractical, and annoying, so I'm very glad with this system for now, but adding three more columns won't hurt us, luckily...
Three or maybe four features are probably enough to distinguish the sounds in each inventory (that is actually a good question to investigate, once CLPA is stabilized and we can check against inventories: how many features are needed, in general), but I have serious doubt in terms of a cross-linguistic system. And I am fully aware of how impractical it is -- remember that my proposal had a feature for each possible articulator (once more, it is an experiment). We can check this, too, once CLPA is stabilized, as I am waiting for that to go on with my system, testing my system against your work. ;)
2017-12-18 10:11 GMT-02:00 Johann-Mattis List notifications@github.com:
My original idea was to have three base features for each sound class and then to add the rest in the EXTRA table, but it turns out allowing a few more as the five now, will still make it easy enough to add and read new data. Before, in the CLPA project, we had all features in a column each, which turned out to be completely impractical, and annoying, so I'm very glad with this system for now, but adding three more columns won't hurt us, luckily...
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cldf/clts/issues/43#issuecomment-352408883, or mute the thread https://github.com/notifications/unsubscribe-auth/AAar93hhhxjRC0jb6beqbKZC6YvAWL7_ks5tBlZvgaJpZM4Q3fTs .
Well, be aware that these are the features which I list in a table, we have the EXTRA column, where you can insert more features in key:value-form, separatted by comma. These considerations are simply practical, as it is easier to type along different columns etc., but if there are too many, errors will increase, so all we want with this system is providing the dummy-featuers that are encoded by the symbols, we do not have any ambition to reflect reality, sparseness, etc., and for this reason, most features are simply binary, such as "breathiness:breathy", etc. BUT, once you have defined the features to create the symbols, you can link to other data and explicit feature systems, and we can also link to your system and see how much overlap we have.
hi, sorry to jump in just now (and if what I will say has already been thought through):
maybe the rationale behind columns and the kinds of phonetic information they carry could be thought as in terms of Feature Geometry and traditional articulatory phoneitcs. Basically, one would need the following columns: 1- Airflow direction: plosive or implosive 2- Airstream mechanism: pulmonic, ejective, clicks 3- Laryngeal setting (Phonation): voiced, voiceless, aspirated, creaky, etc.) 4- Place 1 (Primary place of articulation): bilabial, dental, etc. (including sounds with complex articulations, e.g w (labio-velar) or kp͡(velar-labial) 5- Place 2 (Secondary place of articulation): palatalization, labialization, velarized 6- Manner 1 (Primary oral constriction manner): stop/plosive, approximant, fricative, affricate 7- Manner 2 (Secondary oral constriction manner): sibilant, nasal, lateral, flap, tap, trill
I guess this would avoid the often "hyphenated" phonetic descriptions that are currently found in the database, plus would make it more well structured.
One observation: perhaps we could include in "5-Place 2" or in an 8th column anything that is left in the current EXTRA column and was not distributed to the new columns.
Somehow, my last email did not get through to github. So here it is again: It is of crucial importance to understand that we do NOT seek to come up with our own new features system (for that, compare phoible, #59, and many others, also PBase, etc. What we want, and this was discussed closely with @cormacanderson, is a clear representation of the graphemes that are used in IPA in form of the typical textual descriptions of sounds as we find them in the IPA frameworks, so we want to have a system that allows us to render a sound as "voiced labial stop consonant", we want to preserve the features that linguists use all over the world when describing the sounds, as they are usually trying to do this in their descriptions. This is our comparative concept. In order to do so, we have to go beyond the IPA handbook from 1999, since it does not list all sounds, and we have to come up with some new solutions (diphthongs, clusters), as they are also rarely discussed, but we don't superimpose a new feature system.
The basic procedure is to encode features in a TSV file, and here we have required columns and the extra column. The required columns have the feature in the column header, and the value in the cell. The EXTRA column has feature-value-pairs, separated by comma, in the form of feature1:value1,feature2:value2
. This is extremely convenient to write if you add only a couple of new sounds, as in previous approaches, like CLPA, we had a column for each feature, which was incredibly difficult to search and required to add a new column each time we found we had forgotten something. And we still have not covered all features, such as pre-labialization, for example, as I have seen from the Northeurasian phonology database (#59).
A third crucial point is that we define TranscriptionSystems as generative entities, as we also define diacritics to the left and the right of the base symbol, and they are differently interpreted, depending on where they stand: kʷ
and ʷk
are different sounds. In the diacritics.tsv
, we write ◌ʷ
for diacritics following the base sound and ʷ◌
for those preceding, and assign usually only one new feature to the base sound, based on the diacritic. In this way, we can generate sounds which are not explicitly defined in our system. E.g., we can cover about 1200 sounds of phoible, although we only have 400 consonants and some 200 vowels, which do even not completely overlap. And they are all unique according to our feature system.
This is the main idea of a TranscriptionSystem. If you want to encode a new feature system that is somehow closer to some phonetic or acoustic etc. aspects, this will be encoded as TranscriptionData, that is, a fixed set of sounds, which are not generated, but linked to, via our transcription system. You could also theoretically derive them in some way, but I won't sacrifice the flexibility for TranscriptionSystems to a feature system of which I don't know how to classify all the sounds, so I stick to IPA, and we discuss some basic aspects, like we have been doing in this thread, e.g., how to call certain things, etc., but my major approach is: use what's there, especially on Wikipedia and in the numerous excellent descriptions for Unicode (#54).
But systems like outlined by @thiagochacon or the system of @tresoldi are valuable aspects that are cordially invited to be encoded for a certain set of characters and presented in form of TranscriptionData (see the phoible.tsv for comparison on how this could look, and imagine, you add extra columns for each feature, etc.
In discussions with @cormacanderson, we have further tried to encode more features as binary (ejection:ejective, breathiness:breathy). This has proven successfull, in terms of the generative power of the TranscriptionSystem. It may yield strange results, as we may have a sound like:
>>> bipa['aspirated voiced bilabial stop consonant']
bʰ
But I personally don't care if this sound is possible or not, as we find it way too often in the comparative data, so we don't want to immediately superimpose an interpretation as "breathy voiced bilabial stop consonant", unless the people are explicit.
We will, however, offer the strict-ipa #60, where we can make these decisions, and the bʰ
would be listed as an alias of bʱ
.
To repeat the current plans from above:
consonants.tsv
in order to define these as the potential "base block" of a given sound (code-specific, will enhance parsing and writing of sounds)I guess this would avoid the often "hyphenated" phonetic descriptions that are currently found in the database, plus would make it more well structured.
We'll get rid of the hyphenated stuff by just separating the things, and we're already working on this, so "voiced alveolar sibilant-fricative" becomes "voiced alveolar sibilant fricative" by setting a new feature, following @cormacanderson, called "sibilancy:sibilant". So I don't worry about this, we'll have no hyphens left, but for tones, where I now added "from-high via-middle to-low", thus "from", "via", and "to" to handle contours properly.
a solution of features/columns with binary value "sibilancy:" "laterality", "ejection" might work as a proxy, and perhaps for CLTS is enough. However, wouldn't that make things more similar to CLPA and as a cause of "incredibly difficulty" as mentioned by @LinguList?
If we are to think ahead about having a coherent and realistic structure of the database for all kinds of sound comparison, then we should think more critically about our feature system, not only the labels but its structure. What I proposed is not really "my" feature system, but a rather standard one, following a quite physical hierarchical structure known by some time in articulatory phonetics and more recent feature geometry systems, as in the extract of 'the sounds of the world's languages' book attached.
ladefoged and maddieson feature systems.pdf
in light of the referred kind of system, PHOIBLE is more theoretically biased, and I haven't seen @tresoldi 's system yet. (could you please share it with me @tresoldi ?)
I am glad @LinguList mentioned that other feature systems can further be implemented in CLTS.
The tone and cluster solutions used by CLTS is really nice, since it has some structure behind it (re "from/middle/to" transitions). That could be used to deal with sounds with complex articulations, including those with nasal onset and or release (e.g. [ᵐbᵐ]), and pre-aspiration or post-aspiration, for instance.
I wonder about tones, but I haven't seen the system in full yet to give an opinion; but I assuming you can also make things such as "from-mid via-low to-high", and also allow 5 levels.
a solution of features/columns with binary value "sibilancy:" "laterality", "ejection" might work as a proxy, and perhaps for CLTS is enough. However, wouldn't that make things more similar to CLPA and as a cause of "incredibly difficulty" as mentioned by @LinguList?
By no means, it makes things easier to compute (as ejectivity is one diacritic, so we can create many sounds that we don't even know).
If we are to think ahead about having a coherent and realistic structure of the database for all kinds of sound comparison, then we should think more critically about our feature system, not only the labels but its structure. What I proposed is not really "my" feature system, but a rather standard one, following a quite physical hierarchical structure known by some time in articulatory phonetics and more recent feature geometry systems, as in the extract of 'the sounds of the world's languages' book attached.
Yes, you can add that feature system to CLTS as a TranscriptionData, similar to the one we provide for Phoible, LingPy, PBase, but you cannot use it to replace the inherent "identifiers" we use to link sounds with each other, as those need to be based on the symbols. Please see here for an example, of how we handle transcription data in PBase, and note that a lot of those feature bundles (which are in order there and used as identifiers across datasets) are derived from the graphemes without us linking them manually. This is the major advantage of making features more analytic.
To clarify, the feature system is nothing else but an expansion and a more analytic version of IPA, where you generally have:
Our feature system is analytic in the sense that where other linguists would like to have something which they deem to be meaningful with respect to language, or slim, with respect to Occam's razor, etc., this is not the major concern here, but the concern is:
Your sound [ᵐbᵐ] is an interesting example, as we do not yet have the "bᵐ", since we don't allow for "with nasal release". But if we added this feature, it would be a
pre-nasalized with-nasal-release voiced bilabial stop consonant
And it would be written:
ⁿbⁿ
Since we define that superscript n is enough to denote pre-nasalization see here. It would not be a cluster but just a normal consonant, as a cluster is defined as consisting of two base sounds (this is the only way to detect them computationally).
In general, please trust me that the feature bundles do not need to be realistic in any sense, but practical, as they serve as identifiers which can unambiguously derived from the grapheme-representation of sounds (taking IPA as the standard, but also allowing this for other systems).
So, if you want to have another feature system, all you have to do is to provide a table, with the sound, and its feature values:
CLTS_NAME | BIPA_GRAPHEME | FEATURE_SYSTEM |
---|---|---|
voiceless alveolar stop consonant | t | yourfeatures here |
or alternatively, just use n tables for your feature system, like in this example:
CLTS_NAME | BIPA_GRAPHEME | F1 | F2 | F3 | ... |
---|---|---|---|---|---|
voiceless alveolar stop consonant | t | 1 | 0 | 1 | ... |
For this enterprise, I suggest to start with the 700 sounds we have in the lingpy transcription data. This reflects about most of the sounds we have by now.
closing, discussion superseded by #66
@thiagochacon I've been in the process of reorganizing my experiment for some time, there is an older version at https://github.com/tresoldi/pyphono/tree/master/data . Please remember that is a way of testing idea, I'd never propose it for production. In short, a "Segment" is composed by one or more "Sounds", so that, for example, diphtongs are segments just as "plain" vowels; sounds are described by features that assume a value between 0.0 and 1.0 -- while in most cases you would use only 0.0 or 1.0, in theory you can set everything to a non-extreme value (the most obvious example are levels of phonation, but the wild idea would be for an automatic transcription from sound-waves to values -- as I am saying, it is a bold idea bound to fail). Users can use single characters to map to floating-point values ("a" is something like 0.03, "m" is 0.5, "z" is 0.97, while "-" is 0.0 and "+" is 1.0). Among the differences with systems derived from Chomsky & Halle (but it is not an innovation of mine) is that features are not binary, as said, and that I try to be as descriptive of the physics as possible (so that there is no "dental" feature, for example, but features for "tongue apex" and "upper teeth" as articulator both set to true); another difference is that some feature values can be read (they are computed "on-the-fly") but cannot be directly set (for example, "coronal" would return as True if either laminal, apical, or sub-apical were True, but you cannot set a sound as apical).
There are lots of problems with that... Did I say it is just an experiment? ;)
This follows up on #35 and lists the major discussion for changes mentioned by @cormacanderson