cldf-clts / soundvectors

MIT License
1 stars 0 forks source link

Feature definitions #4

Closed arubehn closed 4 months ago

arubehn commented 8 months ago

@LinguList I have pushed my recent changes, so I would like you to have a look at the current state of the infrastructure. There are several feature definitions that I would like to discuss with you:

Consonant Clusters: I currently define them as the union of positive features of their individual components - so, if a cluster contains a labial and a velar part, all binary features corresponding to labial and velar will be positive (so [+lab, +dorsal, +hi, ...]. I am not sure whether this is the most elegant solution, especially, since in some cases individual feature values will be conflicting. Consider the cluster [kg] -- which value for voicing would we assign? The current system would assign [+voi], since one part of it is voiced. Close vowels with friction: Currently, I do not have a solution for vowels with friction, and I know too little about how they work - so any suggestion is welcome. Implosives: Same as above. The most obvious solution would probably be [+cg], the same feature that is also assigned to ejectives. That would theoretically render those two sound classes equivalent, but since there are no voiced ejectives, and voiceless implosives are quite infrequent, that might be justifiable. Relative articulation: There are multiple minor modifiers, like "advanced" or several types of releases, that I am not sure if I want to account for them at all (and if so, how). In the current version of eval.py, those CLTS features that have a "null mapping" (they are there but they don't change the feature vector) are printed at the end.

Furthermore, I would like to draw your attention to the inventory of distinctive features that I have set up for diphthongs and complex tones. They are described in detail in Section 3.2.1. for my MA Thesis - to my best knowledge, feature definitions for complex sounds like these are novel (phonologists would usually treat them as multiple segments).

Besides tone, which we have to support since it is often represented as a single segment, I am reluctant to encode suprasegmental when it is tied to another segment (usually a vowel). We could of course add a [+stress] feature, and apply the present tonal features to vowels, but I don't know if that would be useful or desirable.

Please have a look at it and let me know what you think. I will, of course, expand this thread if further discussion items come to my mind (they sure will).