bragefuglseth / keypunch

Practice your typing skills
GNU General Public License v3.0
85 stars 13 forks source link

[Language Request]: Arabic #3

Closed ibrahim-mu closed 1 month ago

ibrahim-mu commented 1 month ago

English Name

Arabic

Native Name

العربية

Syntax

Arabic orthography is very different than languages that use the Latin alphabet. It is written from right to left, and the shape of a letter changes depending on its position in the word (stand alone or connected to following or preceding letters or both). There is no capitalization. Words are separated by a space. The punctuation marks ، ؛ : are used to connect clauses and sentences, and . ! ؟ to end sentences. Similar to English, punctuation marks are not preceded by a space but are followed by one. Expressions can be wrapped in parentheses ( ) or typewriter quotation marks " ".

Implementation Assistance

Additional Information

There are many variations of Arabic (e.g. Egyptian Arabic, Lebanese Arabic, Syrian Arabic, Moroccan Arabic), but the most common is the Traditional Arabic (or classic Arabic), which is the formal version used by speakers of all variants for news, paper work, emails, and formal essay writing. The orthography of the many other variants of Arabic is highly debatable since they underwent very little academic study and have less usage in formal writing than Traditional Arabic. Therefore I believe that including Traditional Arabic is enough.

bragefuglseth commented 1 month ago

Hi, there's an initial implementation on the main branch now! There are instructions in CONTRIBUTING.md on how to build and run the app from its source code. You can open the language dialog from the hamburger menu and choose Arabic from there:

Skjermbilde fra 2024-05-24 11-59-14

The text isn't going to make much sense since it's all jumbled words anyways, but please do point out if something is wrong with the word list used. Since Arabic is so different from Latin scripts, there might also be some technical quirks. I'd be glad if you could point those out.

Oh, and the WPM reported should be accurate for Arabic as well, but if it seems way off the charts, we can try to fix that.

bragefuglseth commented 1 month ago

I've reworked parts of the text widgetry now to accomodate Arabic better (the text used to jump slightly around at times because of changes in how the space characters interacted with the Arabic letters).

The only remaining "quirk" I can find is when typing out the lam + alef ligature (ﻻ). When lam is typed out, the ligature "breaks apart" until alef is typed. This is caused by the GTK text machinery, and would be very hard to change (it's not possible to color half a character, so it has to be sliced up). The alternative is to disable the ligature and have them "separated" by default, but I don't think the current state is too big of an issue, and I also assume that lam first appears in its standalone form when typed out normally as well.

The current "advanced" implementation adds Western Arabic numerals, do you think that's OK? Or should Eastern Arabic numerals be used instead?

ibrahim-mu commented 1 month ago

I tried it. It works perfectly, and the way Lam Alef is done is not bad at all. The only problem I noticed about the word list is that it avoids Hamza (ء) and as a result avoids 4 or 5 letters أ إ ؤ ئ and replaces them with ا ا و ى ي which is both wrong and kind of weird, and I haven't found a word with letter آ (Alef Mamduda) but I assume it also gets replaces with ا.

bragefuglseth commented 1 month ago

Great to hear that it works! If you can pinpoint the weird words in the list, just list them here along with what they should be replaced with. I can switch them out.

ibrahim-mu commented 1 month ago

ar_SA.txt Here is the edited word list.

ibrahim-mu commented 1 month ago

Also do you think we can add words that only show in "advanced"? In Arabic (and Hebrew for example), you can add additional marks to letters to differentiate between two or more possible pronunciations of a word. In the example عَلِم، عَلَم، عُلِم، عِلْم، عَلَّم، عُلِّم, all six words have the same letters Ayn, Lam, and Mim, but each one has a different pronunciation and meaning. This is not very common and people rely on the context of the word instead, but it can be nice to add only in the advanced level.

bragefuglseth commented 1 month ago

I'm not opposed to doing that, but the text view currently handles those additional marks rather poorly (you can try with a sample text and see it for yourself, the coloring happens incorrectly and the app crashes after a few lines of text). I can look into the issue at some point in the future, but for now I'm inclined to keep the "simple" form in both modes :slightly_smiling_face:

bragefuglseth commented 1 month ago

The new word list is on the main branch now. Thanks for the help!