Helium314 / HeliBoard

Customizable and privacy-conscious open-source keyboard
Apache License 2.0
2.41k stars 95 forks source link

[Feature Request] Hebrew Diacritics #735

Closed moriel5 closed 5 months ago

moriel5 commented 6 months ago

In Hebrew there is a diacritical system called "Nikkud" (literal translation and meaning is "dotting") which shapes how a letter is pronounced and is used with all Hebrew letters (for example, "אַ" is "a" like in "father", while "אֶ" is "e", like in "better"), and as such replaces vowels.

Currently this system, which with keyboards is generally typed as the "letter" immediately followed by the "diacritical sign" (like on pen and paper), is missing from HeliBoard (I had to type the examples from GBoard).

Also, although I am not fluent there so someone else will have to add information, as Yiddish is based upon a combination of Hebrew and German, and uses the Hebrew script, this will also apply there once support is added, though the diacritics are slightly different there.

russianspy1234 commented 6 months ago

I did a test and it looks like if the symbol is added in the layout system it behaves as expected so it should just be a matter of adding a layout similar to the Arabic specific popup one. Also need popup keys for the dagesh variants like בּ. Those are easier but I am not sure where the nikudim would all go

russianspy1234 commented 6 months ago

I grabbed the json from florisboard, just trying to figure out how I can test it without a pull request

codokie commented 6 months ago

@russianspy1234 I am not sure why they chose to display the vowel diacritics applied to the letter 'ס' even for letters different than it. GBoard for example just displays each vowel diacritic without any letter. I also can't think of a good reason why those vowels appear on long press only for specific letters, because all Hebrew letters can have vowels applied to them. There is the letter 'ף' that has a very convenient position just above the backspace key, and it has no popup keys on long press. I believe all 12 vowel diacritics could fit nicely there by adding the following line to iw.txt: ף ְ ֱ ֲ ֳ ִ ֵ ֶ ַ ָ ֹ ֻ ּ I think that should be enough for adding niqqud support, @moriel5 what do you think? Most people don't write with niqqud and may not like the addition of vowels to the popup keys of some letters.

russianspy1234 commented 6 months ago

The nikkudim appear over the first letter of the name of the nikkud (e.g. the dagesh pops up on dalet) It is confusing for non fluent speakers, but is apparently the most accepted standard for fluent ones.

If I was designing it from scratch I'd probably have like, bet appear over v and shin over sin, etc, and then probably yeah all off the vowels over a single letter like you put.

Note that with Hebrew being right to left, your code for iw.txt would need to be reversed.

Both in my added code and your example code they can be disabled by people who don't want them though, which is nice.

codokie commented 6 months ago

I've seen keyboards that have a separate key for the niqqud, being a fluent speaker myself who don't quite remember the names of each dialect vowel, I think one key for all of them is superior. Still I don't understand why Florisboard layout uses the letter 'ס' or 'ש' for showcasing each vowel.

Note that with Hebrew being right to left, your code for iw.txt would need to be reversed.

when you copy it to text editor it will be reversed, does it not? I did test it earlier

russianspy1234 commented 6 months ago

Gboard does the same thing as Floris, so I think it's better to stick to that as the default and maybe offer the alternative for people who want it as a separate symbol layout.

I think that they just chose a random letter to display to make it easier to see stuff like the difference between a ּ and ִ and such. The code doesn't allow that part to be dynamic.

moriel5 commented 6 months ago

I wasn't available earlier as I am religious, and in my timezone we were celebrating Pesach (Passover), so according to our laws we cannot directly use electrical appliances on Shabbat (Sabbath/Saturday) and most holidays (we have distinction regarding different types of holidays), however I am now available.

@russianspy1234's explanation is accurate.

Personally I think that in this case the default makes more sense, as it indirectly helps people understand why the diacritics exist, however I understand why some would prefer a dedicated key, so I have no qualms with adding an option to set that.

Regarding FlorisBoard, I too think that they choose a random letter, and given that they are showing it next to the diacritical symbols rather than affected by it, I personally think that their approach of showcasing the diacritics is a waste of time, as it is redundant and not necessarily helping the uninitiated understand what they are looking at.

russianspy1234 commented 6 months ago

Floris has a weird way of showing them. In my implementation it shows the samech affected by the nikkud, even if it can't actually be affected by it like ּ, though it uses ש to display ׂ.

You can try out my debug apk here https://1drv.ms/u/s!AlWQ8RKzV_SL6FC0QKAxTN8xH6z4

הג פסח סמח

moriel5 commented 6 months ago

So far, it is looking good.

However, I personally think that GBoard's way of previewing then is easier to understand, as it shows the diacritics without any letters.

If that is more work than is worth, I think that using the "blank circle" Unicode character that FlorisBoard is using (instead of the Samech, rather than together) would also be fine.

And thank you very much. Please don't take it the wrong way, I'm just a perfectionist, so I tend to correct everyone, including myself, however the word "Chag (or to be more accurate, "Khag", as the differences between Alef, He (also mistakenly called "Hei"), and Chet/Khet are due to them being more dependent on the way the air is pushed through the throat) is spelled "חַגּ" (I'm not going to get too deep into which diacritics need to be used, like whether Patach/Patakh or Kamatz (with the "small" and "large" variants) is appropriate, as I am still learning that myself (many mistakes exist with Eliezer Ben Yehuda's system, which was well intentioned, but amongst the sources relied upon, many were inaccurate).

russianspy1234 commented 6 months ago

So far, it is looking good.

However, I personally think that GBoard's way of previewing then is easier to understand, as it shows the diacritics without any letters.

If that is more work than is worth, I think that using the "blank circle" Unicode character that FlorisBoard is using (instead of the Samech, rather than together) would also be fine.

And thank you very much. Please don't take it the wrong way, I'm just a perfectionist, so I tend to correct everyone, including myself, however the word "Chag (or to be more accurate, "Khag", as the differences between Alef, He (also mistakenly called "Hei"), and Chet/Khet are due to them being more dependent on the way the air is pushed through the throat) is spelled "חַגּ" (I'm not going to get too deep into which diacritics need to be used, like whether Patach/Patakh or Kamatz (with the "small" and "large" variants) is appropriate, as I am still learning that myself (many mistakes exist with Eliezer Ben Yehuda's system, which was well intentioned, but amongst the sources relied upon, many were inaccurate).

As you can tell, I don't type Hebrew often. I can't go much further than what I did with copying the file over. It's hard to predict how nikkudim will behave with non Hebrew Unicode characters, and it can actually vary by device which is probably why Floris used the samech

moriel5 commented 6 months ago

To be fair, I try not to delve into people's level when correcting, as I do try to only correct small mistakes that are (usually) supposedly well known, but not usually thought of, since as I had mentioned, I am still learning myself (beyond the Eliezer Ben Yehuda system that is usually taught), so I did not differentiate between you or any other person typing in Hebrew.

Hmm... Regarding behaviour with non-Hebrew Unicode characters, that is a fair point, which reminds me of the Android 2.0 days, when I had to replace the fonts in /system on our old PocketBook IQ 701s, just to get proper support for viewing Hebrew characters.

However, I think that with an option to enable/disable additional characters (like what is mentioned in issue #303), this should be a non-issue, as at least since Android 5, if not earlier, Unicode support has been far better, good enough that HeliBoard should not affect, at least not negatively, the behaviour.

Also, I do believe that the Unicode character I had mentioned is simply a fallback character, which is used automatically on many systems when there is no base character (e.g. a letter) together with the diacritics, which is why it is shown when the system does not properly support the relevant diacritics, in this case, Nikkud to. And even FlorisBoard shows, at least for me, the same Unicode character, in addition to the Samech, so I am guessing that it should be fine on other devices even without a dedicated letter (but possibly with the fallback Unicode character).

Can I take a look at the file? I'm no developer (I only know a little HTML), however I can generally "see" the structure/pattern, so I may be able to help out on that front, and I can also help with testing over several different UIs (I use AOSP-based ROMs, however I have family and friends who use a wide variety, including TouchWiz, OneUI, LG's old UIs, MIUI, ColorOS and HydrogenOS, and I have contacts that I believe can help me get in contact with those that use more exotic UIs, such as Fujitsū's devices that are provided in Japan by NTT Docomo).

codokie commented 6 months ago

I tried to see if Android Lollipop, which is pretty old, can render correctly Hebrew diacritics that are not applied to letters and it seems like it can do that. I don't think there is a justification to showcasing the letter samech like Florisboard does. It also does not prevent applying vowels to non-Hebrew letters..

moriel5 commented 6 months ago

@codokie That would make sense, as Hebrew diacritics were added to the Unicode standard in their own right quite some years ago.

I do find it interesting that Hebrew cantillation was also added to the Unicode standard, however most people will not need those (though people like me will need them), as they are only needed for religious purposes (including academic purposes within religious texts), so if those are also added (it would be the same concept as diacritics from a typing perspective), they should definitely be hidden behind an option to enable additional symbols.