CUNY-CL / wikipron

Massively multilingual pronunciation mining
Apache License 2.0
312 stars 71 forks source link

[fre] Aspirated <h> not included in transcriptions #212

Closed ajmalanoski closed 3 years ago

ajmalanoski commented 3 years ago

Wiktionary does not mark aspirated in its transcriptions for French, instead adding a note before the transcription indicating whether a word-initial is aspirated or mute, as on this page. This distinction is phonologically relevant – although is silent, words that start with an aspirated behave as though they start with a consonant – so it probably should be included in our transcriptions, right?

kylebgorman commented 3 years ago

You're asking the right kind of question... h-aspire is the kind of thing that ought to be in a true phonemic transcription of a word but... I am not surprised that it isn't present here. We don't catch specialty tags like that superscript "aspirated h" that you see on the page for haricot, and if we did, say just for French, what would we add to the phonemic transcription? Would you instead give it as /ha.ʁi.ko/? I don't know...there be dragons.

ajmalanoski commented 3 years ago

Most bilingual dictionaries I've seen use an apostrophe or something similar (e.g. /'aʁiko/), and the Wikipedia article on it says that French dictionaries mark it with an asterisk (e.g. /*aʁiko/). Of course, neither choice is standard IPA, but I suppose we could add a note in the phones file to explain.

kylebgorman commented 3 years ago

I am vaguely opposed to adding non-IPA signs, since that would likely break other tools and require language-specific interpretation notes. Is there an IPA alternative?

I think in this case we may have to acknowledge that this tool is really only generating citation pronunciations; it doesn’t handle sandhi phenomena (which this is: we don’t have any other liaison information here either) nor is it obvious how to make it do so without substantially enriching the spec.

On Thu, Oct 1, 2020 at 11:00 PM ajmalanoski notifications@github.com wrote:

Most bilingual dictionaries I've seen use an apostrophe or something similar (e.g. /'aʁiko/), and the Wikipedia article on it says that French dictionaries mark it with an asterisk (e.g. /*aʁiko/). Of course, neither choice is standard IPA, but I suppose we could add a note in the phones file to explain.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/wikipron/issues/212#issuecomment-702500525, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OJALZEFHJDH7UNV2ZLSIU63PANCNFSM4R7Y4NKQ .

kylebgorman commented 3 years ago

I'm going to close this for now. We can reopen if we ever figure out what the right thing to do is. I think this might be a simple enhancement to WikiPron (cf. Lucas's scripts to grab frequency data) but not part of WikiPron itself.