Open asfimport opened 4 years ago
Selene Broers (migrated from JIRA)
Linguist/software engineer/native Dutch speaker here.
Most of the words in Dutch where the accent grave is used, are indeed taken from French. Some examples are "scène" (scene), "caissière" (female cashier), "barrière" (barrier) or "crème" (cream).
As "ie" is another sound in Dutch ("i" in IPA, so like "ee" in "deep"), it's necessary to add an accent to the e if the pronunciation needs to be different. This is the case in "caissière" and "barrière".
For the syllables with just the è ("scè-ne" and "crè-me"), adding the accent is necessary to indicate the correct pronunciation (IPA ɛ:). Otherwise, a Dutch reader would pronounce it like the IPA e. When the character 'e' is at the end of a syllable in Dutch, it's pronounced as the IPA e. When the character 'e' is at the beginning or in the middle of a syllable in Dutch, it's pronounced as the IPA ɛ. The character 'è' is pronounced as the IPA ɛ: (which is a lengthened ɛ) , no matter its place in the syllable.
There are a few words where the accent grave is used on native Dutch words, like the exclamation "hè" (meaning "what?") and the verb "blèren" (to squall, to bawl). The accent on "hè" is necessary, because "hé" means "hey" and "he" in Dutch is a nonsense-word. Because the verb "blèren" is divided into syllables as "blè-ren", the 'è' is at the end of a syllable. The accent is there, because the e is clearly at the end of a syllable. Contrary to the normal rule, it should NOT be pronounced like IPA e here, but as IPA ɛ: .
This verb, "blèren", should keep the accent in its declensions. The pronunciation of the è in the word "blèren" is actually a lengthened ɛ (ɛ: in IPA). If you remove the accent, the e in the declensions where it's in the middle of a syllable, will read like a normal ɛ, making it a nonsense-word. Example: ik blèr (IPA blɛ:r , correct first-person singular) ik bler (IPA blɛr , incorrect first-person singular, nonsense-word)
The only cases I can think of where Dutch uses the accent grave on other vowels, are "à la carte" (loanword from French) and just to give an extra accent to words/emphasis on certain syllables (without changing the meaning or pronunciation). I suppose that is why the Snowball algorithm keeps the accent on the e, but not on the other vowels.
I have a concern on how Dutch Snowball algorithm handles the grave accent of è.
It removes the grave accents on
à
,*ò*
,ù
,ì
but doesn't with è. I wonder if there is something special with è that the stemmer wants to ignore it.Also, from http://www.dutchgrammar.com/en/?n=SpellingAndPronunciation.25, I found out that grave accent is not used commonly in Dutch anymore except in some borrowed French words.
If è is not that common in Dutch, removing grave accent on it sounds reasonable to me and definitely benefits search recall in general.
I would like to know if anyone had a strong opinion on this topic ? It would be also nice if you have some point of views as a Dutch speaker.
Thanks !
Migrated from LUCENE-9295 by Nguyen Minh Gia Huy, updated Jul 23 2020