CAMeL-Lab / camel-guidelines

https://camel-guidelines.readthedocs.io/
1 stars 9 forks source link

The use of Haraka in undiacritized Arabic text #10

Closed csisc closed 1 year ago

csisc commented 2 years ago

Sometimes, the use of Shaddah is needed to disambiguate between lexemes: سلّم: say hello to someone (Tunisian) سلم: being safe (Tunisian) I think that Shaddah should be added in such an important situation. As well, Haraka can be interesting to differentiate between "Al-" and Alif Madda coupled to an l in a given word: بالْغة: Pubescent باليمين: On the right It seems that adding a haraka to l in this situation is excellent. Another example where Sukun can be useful in undiacritized text is the differentiation between two types of noun phrases: كلمة باهية, قول باهي: Good word (adjective and noun compound) كلمةْ حق, قولْ حقيقة: True word (additional phrase) I think that mentioning this is absolutely useful.

csisc commented 2 years ago

The second proposal solves the problem of clitic identification.

csisc commented 2 years ago

It seems that you already discussed Shaddah in the Paper: "As such, using the Shadda interacts with the number of letters in a word. The Shadda general rule states that it is used within the baseword, but not across word-clitic boundaries. Any exceptions must be specified in the specification rules". However, what I meant here is a particular situation where shaddah is directly followed by a sukun. In fact, where a shaddah is followed by a sukun, users tend not to transcribe the shaddah.

Examples: كلّ: Every مكلّمين: Talkers

In such a situation, it will be interesting to mention that Shaddah is kept where it exists in other forms of the word (e.g., inflections) or where the word is linked to a suffix (e.g., -u).

nizarhabash1 commented 1 year ago

Note added: https://camel-guidelines.readthedocs.io/en/latest/orthography/#4421-to-diacritize-or-not-to-diacritize