cldf-clts / clts

Cross-Linguistic Transcription Systems
https://clts.clld.org
13 stars 3 forks source link

Dealing with U+0347 #51

Closed cormacanderson closed 3 years ago

cormacanderson commented 3 years ago

PHOIBLE uses the combining equals sign below U+0347 for non-sibilant fricatives. This is the alveolar diacritic in the extended IPA so is liable to cause confusion. This is a bit of a problem in the IPA, with no good way to deal with these.

In contrast, this one from PHOIBLE looks more like the alveolar diacritic, as it's not on a sibilant.

cormacanderson commented 3 years ago

@bambooforest, @drammock

drammock commented 3 years ago

I don't recall the exact rationale behind choosing 0347; the decision was made probably 8 years ago. Probably something like this:

its extIPA meaning is coronal-related, it's supported by the common IPA fonts, and if we end up needing to represent "labioalveolar" articulations (what it is used for in extIPA) we can do something like fs or vz

Looking at it with fresh eyes, today I would prefer a combining downwards arrow below (which doesn't exist as far as I can tell), by analogy to the combining upwards arrow below (U+034E) which extIPA uses for "whistled articulation" --- non-sibilant seems like approximately the opposite of "whistled".

A possible alternative is U+1AB3 (combining downwards arrow above) but I don't think that's better than sticking with what we already use --- it's likely to be less widely included in fonts, and the change might cause confusion among users who read the documentation and learned what we say we mean by U+0347. Personally I am not too concerned about continued use U+0347 being a source of confusion for new users, since in my experience most linguists have only a passing familiarity with extIPA (if they're aware of it at all).

tresoldi commented 3 years ago

I have been playing around with the idea of U+032C and U+032D -- they are non-standard and not used, as far as I can tell, but render "nicely": s̬ s̭

tresoldi commented 3 years ago

(But of course it interferes with the standard diacritic for "voiced"...)

bambooforest commented 3 years ago

In an early version of the feature set, we noted that U+0347 would indicate [-strident] and that it should only be applied to [+coronal] sounds.

The affected segments occur only in SPA and UPSID, in a handful of overlapping inventories (13 languages in total) and appear in these segments:

ʂ͇ ʃ͇ ts͇ ts͇ʰ ʈʂ͇ z͇ z̪͇|z͇ ʐ͇ ʒ͇

The rest are from two languages in EA, including ð͇, which are a bug that got through the PHOIBLE parser:

2533 ersu1241 Ersu d͇z͇ 2533 ersu1241 Ersu nd͇z͇ 2533 ersu1241 Ersu nt͇s͇ʰ 2533 ersu1241 Ersu s͇ 2533 ersu1241 Ersu t͇s͇ 2533 ersu1241 Ersu t͇s͇ʰ 2533 ersu1241 Ersu z͇ 2592 tosk1239 Tosk Albanian ð͇ˠ

Where they indicate alveolar:

http://eurasianphonology.info/listview?lang=Ersu%23179

And what looks like a duplicated dental / alveolar fricative (ð͇):

http://eurasianphonology.info/listview?lang=Tosk+Albanian+%28Kor%C3%A7%C3%AB%29%23387

We can't use the same symbol for different meanings, so perhaps we should change the former to something else or change the latter in the EA mapping file.

JIPA actually uses dots below:

https://www.cambridge.org/core/services/aop-cambridge-core/content/view/C91C8AD692A11052A92B6C9FB4267F72/S0025100314000437a.pdf/ersu.pdf

So that's not much help.

cormacanderson commented 3 years ago

Maddieson uses the retracted diacritic on ð θ to indicate alveolar and the dot below to indicate retroflex. This would be an option here perhaps.

I don't understand why the Ersu examples need the alveolar diacritic at all, as surely that is the default articulation for s, z etc. The fact that the same language has contrasting dentals is immaterial, because these can be marked with the diacritic.

drammock commented 3 years ago

OK, so summarizing the issue:

U+0347 is used in an inconsistent way in PHOIBLE; in UPSID and SPA inventories it means "non-sibilant" and in EA inventories it means "alveolar". The latter is a mistake on PHOIBLE's part; we should have corrected / re-mapped those phonemes when ingesting EA. Given that we did not catch it, it raises the question of whether to stick with the original usage, the EA usage, or switch to something else altogether.

I agree with @cormacanderson that the Ersu phonemes don't actually need the diacritic: the default interpretation of s, z, d, etc. is that they are alveolar sounds. The "duplicated dental/alveolar fricatives" that @bambooforest links to are clearly a mistake in the source (both represented as ð͇ˠ), and should probably be ðˠ and (or possibly z̪ˠ and ?) respectively.

So if we don't need U+0347 to mark "alveolar", the question simplifies to: do we keep using it for "non-sibilant" or do we retire it and use some other symbol for "non-sibilant"? So far I don't see any proposed codepoints that seem clearly better to me, and I am generally against changing it without a compelling reason, as it likely creates confusion among users to change the symbols we use. So, is there some compelling reason? Some other problem with continuing to use U+0347 to mark non-sibilants that I'm not grasping?

cormacanderson commented 3 years ago

That is fair enough @drammock and consistent from a PHOIBLE perspective. However, with CLTS we are dealing with various sources, which differ in their practice, and while adopting the PHOIBLE practice is one option, I'd be inclined to follow an IPA solution if one exists.

Ladefoged and Maddieson (1996) only recognise alveolar /θ̠ ð̠/ and retroflex /ɻ̝/, while Wikipedia also has /ɹ̠̝/ and /ɹ̠̝̊/ for the postalveolars. Further to these we have affricates (including /tθ̣/ in LAPSyD/.

Suspicious to me is the fact that these sounds seemingly occur only in SPA and LAPSyD inventories, possibly also EA. Why do we not find them from the other contributors? I went digging in the original sources to get a better handle on what might be going on. I also checked further sources about the same languages.

Basically, I consider most of the SPA and UPSID segments spurious or at best dubious when compared against the original sources. Ones that I'd say are solid are: Armenian: "a voiced retroflex fricative", where I'd suggest /ɻ̝/ Hopi: "an apico-alveolar retroflexed fricative", so maybe also /ɻ̝/ Azerbaijani: basically a flap but "grooved tongue shape, some friction", so maybe /ɾ̝/

All of these are discussed under the rubric of rhotics. To them should be added: Tacana: "flap articulation simultaneous with friction made by the tongue blade", in UPSID 583 as r͓, but for which /ɾ̝/ would also work maybe Yurakurè: "voiced post-alveolar fricative", in source and PH 1026 as /ɹ̝/, while Wikipedia might prefer /ɹ̠̝/. (SAPHON uses the same source but instead interprets this as /ʐ/, as well as original /dʲ/ as /d/....)

The fictionalised diacritic on the various rhotic symbols might be the easiest solution?

I couldn't track down any of the following on the off-chance any of you can help me with them. Jones Jr, Robert B. 1961. Karen Linguistic Studies: Description, Comparison, and Texts. (University of California Publications in Linguistics, 25.) Berkeley: University of California Press. Mazaudon, Martine. 1973. Phonologie tamang: étude phonologigue du dialecte tamang de Risiangku (langue tibéto-birmane du Népal). Paris: SELAF. Morgenstierne, Georg. 1945. Notes on Burushaski Phonology. Norsk Tidsskrift for Sprogvidenskap 13. 61–95.

cormacanderson commented 3 years ago

I want to emphasise that I am not trying to deny the existence of non-sibilant coronal fricatives. In fact, I frequently contrast five of them in my English:

However, none of these are the most common allophone and I wonder if this is relatively common, that we get coronal stops without closure, fricative rhotics, etc. relatively frequently, but not so often that they are considered the "basic" allophone that is written into a description.

drammock commented 3 years ago

That is fair enough @drammock and consistent from a PHOIBLE perspective. However, with CLTS we are dealing with various sources, which differ in their practice, and while adopting the PHOIBLE practice is one option, I'd be inclined to follow an IPA solution if one exists.

Of course. For now I think it's fair to assume that PHOIBLE will continue using 0347 to mean "non-sibilant" in at least the SPA and UPSID cases. How you choose to translate this into CLTS is up to your team to decide.