CAMeL-Lab / camel-guidelines

https://camel-guidelines.readthedocs.io/
1 stars 9 forks source link

Alif Maqsura #1

Closed csisc closed 3 years ago

csisc commented 3 years ago

We are a group of researchers that tested the CODA guidelines among other Arabic Script conventions on real users from Tunisia with the contribution of Derja Association. We held three demo sessions in late 2019. Given that developing a large-scale writing convention for Arabic dialects is more important than developing a convention for Tunisian Arabic, we decided to share with you our findings so that they be taken into consider in enriching CODA* Guidelines.

In "Unified guidelines and resources for Arabic dialect orthography", you specified this: Alif Maqsura The MSA rules for spelling the AlifMaqsura (ø ý), which are sometimes based on roots and sometimes on patterns, apply in CODA*.

This is not explicit as a rule. We propose to decide the transcription of Alif Maqusra for verbs according to their present. Example, جاء (to come) becomes جا in Tunisian Arabic. We propose to write it as جى as its present is يجي.

nizarhabash1 commented 3 years ago

The Alif Maqsura statement assumes the reader knows MSA spelling rules. We thought the statement was clear given knowledge of Arabic morphology. Patterns liks فعلى have an Alif Maqsura in the form; and roots ending with ي which realize as [a] are spelled as ى. For MSA, the idea of using the imperfective ('present') form does not work -- take the verb رأى يرى. the vowel is [a] in both. The rule applied because the root is رءي supported by رؤية، رأي.

The case of جا/يجي is interesting because this common verb is irregular in many dialects (e.g. Egy allows جم for they came). We did not think this was an Alif Maqsura justifiable case to keep the connection to MSA جاء. Hamzas often disappear in dialects, and we think maintaining the form is desirable to maximize connection across variants of Arabic -- especially when the pronunciation is not affected.

The rule you suggested is interesting; but it is not CODA and it is inconsistent with MSA.

csisc commented 3 years ago

This is morphology-motivated: We have seen it from a different point of view. We have analyzed the ending of verbs in present and compared them to the ending of verbs in the past. We found several inconsistencies in aligning present endings with past endings. That is why we think of this solution to let the learning of the Arabic morphology easier for the beginners. In fact, this rule is the one used in the MSA. We have chosen to be more consistent with the "morphological etymology" of dialects rather than the "lexical etymology" of words.

nizarhabash1 commented 3 years ago

Thanks for the clarification. But this is not a CODA* issue.

csisc commented 3 years ago

Another example of such a matter is رسا (to anchor). The present of رسا is يرسو in MSA and يرسي in Tunisian. The question here is how not all the present verbs finishing with long [i] have a past form finishing with Alif Maqsura. Another question can be how computers can learn to generate rules for the morphology of Arabic dialects with exceptions.

This is not a CODA* issue. I know. However, this is important to know.

csisc commented 3 years ago

Thank you. I finished all the issues that have emerged from the practice of CODA by native users. I think that these eight issues were important to raise, particularly as we tested CODA guidelines on Maghrebi Arabic users.