Closed xrotwang closed 3 years ago
Here's an example:
>>> clts.transcriptiondata('phoible')
<pyclts.transcriptiondata.TranscriptionData object at 0x7fc9c5823940>
>>> p = clts.transcriptiondata('phoible')
>>> p['voiced retracted-tongue-root velar stop consonant']
'ɡ̙'
>>> s = p.resolve_grapheme(p['voiced retracted-tongue-root velar stop consonant'])
>>> s.featuredict
{'raising': None, 'relative_articulation': None, 'friction': None, 'articulation': None, 'preceding': None, 'syllabicity': None, 'nasalization': None, 'palatalization': None, 'labialization': None, 'velarization': None, 'pharyngealization': None, 'glottalization': None, 'aspiration': None, 'duration': None, 'release': None, 'voicing': None, 'creakiness': None, 'breathiness': None, 'phonation': 'voiced', 'laminality': None,
'tongue_root': 'retracted-tongue-root', 'place': 'velar', 'ejection': None, 'airstream': None, 'manner': 'stop'}
tongue_root: retracted-tongue-root is not illegal, we defined it as
"consonant": {
"relative_articulation": [
"centralized",
"mid-centralized",
"advanced",
"retracted",
"retracted-tongue-root"
],
in the features.json
or what is illegal about it?
But that definition says "relative_articulation: retracted-tongue-root" is ok - not "tongue_root: retracted-tongue-root". The latter is only ok for vowels.
Ah, mierda. I think what may happen is that the lookup fails! Feature lookup starts from feature value names that are mapped to the feature types directly. So the "resolve_grapheme" probably looks up the wrong feature. But this would also mean that we will have similar problems with long
, which are also defined for vowels and consonants? I remember, in any case, that our feature collection may NOT distinguish types for features.
Ah, definitely: length is a common feature of vowel and consonant. Retracted is assigned to TWO different feature types for vowels vs. consonants.
So we should add another feature type to consonants, or we should change the name of retracted_tongue_root.
@cormacanderson should probably help here.
Adding a new feature for consonants would force us to modfy pyclts code.
I don't think lookup is the problem. featuredict
is determined by the input data, as far as I can tell. Will investigate a bit more.
TranscriptionSystem._feature_values
(a dict), in transcriptionsystems.py
, I think it will map to the first type that is served, assuming that one feature value resolves to only one feature type, which would also be logical.
I think the problem is here:
$ grep "tongue_root:" pkg/transcriptionsystems/bipa/consonants.tsv
r̠̙ voiced alveolar trill tongue_root:retracted-tongue-root,relative_articulation:retracted
k̙ voiceless velar stop tongue_root:retracted-tongue-root
g̙ voiced velar stop tongue_root:retracted-tongue-root
d̙ voiced alveolar stop tongue_root:retracted-tongue-root
d̪̙ voiced dental stop tongue_root:retracted-tongue-root
t̪̙ voiceless dental stop tongue_root:retracted-tongue-root
ʕ̙ voiced pharyngeal fricative tongue_root:retracted-tongue-root
s̙ voiceless alveolar fricative tongue_root:retracted-tongue-root,airstream:sibilant
ð̙ voiceless dental fricative tongue_root:retracted-tongue-root
s̙ˤ voiceless alveolar fricative tongue_root:retracted-tongue-root,airstream:sibilant,glottalization:glottalized
ð̙ˤ voiceless dental fricative tongue_root:retracted-tongue-root,glottalization:glottalized
tongue_root
is a feature not defined for consonants - but it is used here.
Autsch, that is even worse.
Because it means we have to add the feature to the consonants class in models.py, and we have to check the write order, when tongue_root should be applied, and this is always a bit tedious.
But couldn't we just replace tongue_root
with relative_articulation
here? Assuming the value is correct - just not the feature?
It's actually nice how I determined this problem. I added primary keys to features.csv
and turned featuredict
into a list of foreign keys into features.csv
. Boom, 6 cases where the foreign key constraint failed :)
:-)
The question is what this will do with TS._feature_dict
.
In [1]: from pyclts import CLTS
In [2]: bipa = CLTS().bipa
In [3]: bipa._feature_values['retracted-tongue-root']
Out[3]: 'tongue_root'
since TS._feature_values
can only give one type for a value, we'd have a type clash here, since we have different types for vowels and consonants in the feature value retracted-tongue-root
. Even if this does not create clashes, it could any time, couldn't it? The feature_values are important for parsing from feature names (like voiced bilabial XXX consonant
).
Hm, I don't quite understand. Using (type, feature, value) as primary key for features, we don't have any clashes yet. But I also don't understand why retracted-tongue-root
was used as value for relative_articulation
for consonants. It seems to be bolted on, somehow - even has a tab as whitespace in the file - not spaces like all other values.
the reason was that we were in a rush to add new values, and @cormacanderson suggested to just put it there, so I did not have to add yet another stupic consonant feature there.
But it does obviously NOT make sense in the relative_articulation group, so it should be handled by adding a new feature to the consonant class in models.py anyway, just for consistency. Tests should show if my intuition that the _feature_values
lookup could suffer and yield errors, holds, but even if this is not the case, it would not make sense to go on with this as is, right?
Ok. So the proper fix would be moving it into a tongue_root
feature for consonants, right? It only affects 97 graphemes overall, so i guess this would still work as v2.1.0 :)
Yes, that is a relief ;)
Ok. Will try to fix this tomorrow.
I feel bad if you have to fix this, but I'd only have time to do so from Tuesday, I am afraid.
A new tongue root feature for consonants with two values seems to me to be principled. I remember not liking it being in relative articulation
at the time, but we were under time pressure.
Let me know what if anything I can do @xrotwang
Yes, that would be analogous to vowels, too. Will see whether I can add this in a backwards compatible way.
Cormac Anderson @.***> schrieb am Sa., 17. Apr. 2021, 23:10:
A new tongue root feature for consonants with two values seems to me to be principled. I remember not liking it being in relative articulation at the time, but we were under time pressure. Let me know what if anything I can do @xrotwang https://github.com/xrotwang
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/cldf-clts/clts/issues/104#issuecomment-821888103, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGUOKEAHJJZSK3VIVMQ44TTJH2LNANCNFSM43DIELIQ .
It seems that this solution has been intended all along: Just fixing features.json
made it work. pyclts
already had tongue_root
as feature for consonants. So we might even want to release 2.0.1 with just that fix, and roll the other changes into 2.1.0 ?
Yes! So the problem then was: we were rushing, and the person copy-pasting that feature into features-json (I bet it was me), did not paste it into the right place :(
So yes to v2.0.1 - or is v2.1.0 ok right away? I guess, v.2.0.0 has only been used in two lexibank datasets so far - tuled and one more, I think. And if we'd update these to v2.0.1, we might as well update to v.2.1.0 right away?
Agree!
I'll release pyclts today, and then put together a final PR for clts v2.1.0.
I haven't pinpointed exactly how and where this happens, but there a couple of sounds with
type=='consonant'
and a featuretongue_root
:retracted-tongue-root
in theirfeaturedict
. According tofeatures.json
this is illegal.