Julow / Unexpected-Keyboard

A lightweight virtual keyboard for developers.
GNU General Public License v3.0
1.25k stars 167 forks source link

korean keyboard does not combine the typed characters #558

Open WeepingClown13 opened 4 months ago

WeepingClown13 commented 4 months ago

Hi, I just found out earlier (!) that unexpected keyboard, contrary to what I thought all this time, actually has other language keyboards as well. This is very good news, but it seems that typing in Korean does not go the way it should. For example, "ㅇㅛㅈㅡㅁ" is being output like that but should in actuality appear as "요즘" to the best of my knowledge. This makes me wonder that whether I am missing some prerequisite or the logic for combining the characters are not yet present. Which is the case here, or is this an entirely different matter? (I did a search in the {open,closed} issues with the words {korea,korean} but could not find anything related)

Screenshot_20240217_161951_Unexpected Keyboard.png

Julow commented 4 months ago

This layout was added in https://github.com/Julow/Unexpected-Keyboard/pull/115 and I don't think it has many users, as indicated by the state of the Korean translation, which is currently the most outdated.

Currently, the keyboard doesn't rewrite any character that is already typed. The only combination system that is implemented so far is using dead-keys (first activate the dead key, than type the modified character) but that wouldn't work well for Korean.

I would be happy to have this in the app but I need help from new contributors.

WeepingClown13 commented 4 months ago

@Julow I would love it if we could get this to work, and I am willing to contribute as well. I am a complete beginner with all these though, so not only will it be slow if I were to work on it, I'll need a fair amount of guidance as well. If all that works, great. Finding more people who'd be interested in improving this also would be nice.

Julow commented 4 months ago

This feature has no equivalent at the moment so it might require some experimentation. Keys are handled here: https://github.com/Julow/Unexpected-Keyboard/blob/master/srcs/juloo.keyboard2/KeyEventHandler.java#L83, character keys are handled here: https://github.com/Julow/Unexpected-Keyboard/blob/master/srcs/juloo.keyboard2/KeyEventHandler.java#L190

Perhaps the best would be to use setComposingText, instead of commitText. This class might implement unicode composition (I think we'd want NFC), though we need a way to differentiate characters that can still be composed (and be written with setComposingText) from characters that cannot and should be committed (with commitText).

WeepingClown13 commented 4 months ago

Thanks for being so detailed. Never played with Java before so this definitely helps. If I get some time, I'll try to make some sense out of the things and experiment on it. Meanwhile if someone could pick it up and fix it, that'd be incredible.

Julow commented 3 months ago

I implemented my suggestion from above in this PR: https://github.com/Julow/Unexpected-Keyboard/pull/594 Could you give it a try ?

cbjx commented 3 months ago

Hi, i just tried that PR and it still has some issues. (i'm not good at english so please ask me if you dont understand what i said, translator was used)

Before start explain issues, i will explain how korean is typed first. Hangul has three parts: starting sound, vowel (middle sound), and ending sound. The starting and ending sounds are consonants. Generally, the starting and ending sounds share the same consonant, but there are also consonants that are used only for the ending sounds. To input Hangul, you must input the starting sound, vowel, and ending sound in that order (in some cases, the ending sound is not necessary). Then, the three letters are synthesized by software. Now let’s take a look at the Korean keyboard layout, Dubeolsik. Dubeolsik is divided into two parts: consonants on the left and vowels on the right. Not all consonants and vowels are in the layout, and letters not in the layout can be entered using the shift key, different consonants and consonant combinations, or vowel and vowel combinations. (Here, the consonant obtained by combining different consonants is used only in final sounds and is called a double consonant. The vowel obtained by combining another vowel is called a diphthong.)

From now on, I will tell you what problems arise. Before this update, not a single letter was synthesized, but now the first sound and vowel are synthesized. However, the ending sound is not synthesized. Therefore, if you enter the word “요즘”, “요즈ㅁ” is entered. (Here, ㅁ is the ending sound to make ㅁ.) In my opinion, when consonants are entered, they are not differentiated into starting and ending sounds, but all consonants seem to be recognized as starting sounds. It was also said that a diphthong can be created by combining a vowel and a vowel, but this diphthong can only be created when no starting sound is input. For example, to enter the letters “위”, enter ㅇ, ㅜ, ㅣ in that order. ㅇ is the starting sound, and the two vowels ㅜ and ㅣ must be combined to form the vowel ㅟ. If you actually input it, “우ㅣ” is entered. I'm not sure how the current system recognizes vowels, as I see that if I just type ㅜ andㅣ without any starting sound, "ㅟ" comes out normally. And characters that should be able to be entered using the shift key cannot be entered using the shift key. These letters are ㅂ, ㅈ, ㄷ, ㄱ, ㅅ (consonants) ㅐ, ㅔ (vowels), and when entered with the shift key, ㅃ, ㅉ, ㄸ, ㄲ, ㅆ, ㅒ, and ㅖ are entered respectively. In addition, in the case of consonants, a function is currently applied that produces the same effect as entering the same consonant consecutively with the shift key. It seems that a function to toggle this function is also needed.

Julow commented 3 months ago

Thanks for the explanations and feedback!

I hoped that unicode NFKC normalisation would solve this but it has limitations. In unicode, the consonant are defined several times for different purposes: "letter", choseong, jongseong and other variations but the layout only contain the "letter" variant.

Based on your explanations, I'm now experimenting with an other idea that use the modifier mechanism in a way that is similar to the recent compose key. I also found out that there exist a formula for composing hangul syllables, which will make this easier to implement without relying on the complex normalisation functions.

Julow commented 3 months ago

Thanks to your help, I've made a new experiment in https://github.com/Julow/Unexpected-Keyboard/pull/595 Can you try it? Debug build here: https://github.com/Julow/Unexpected-Keyboard/actions/runs/8335193980?pr=595

cbjx commented 3 months ago

Hi, I tried that PR. It can type most of korean charactors, but still some charactors cannot be typed and maybe modifier mechanism isnt suitable for korean typing. First, Shifted vowels(ㅒ, ㅖ) cannot be combined. I think this happens because after type starting sound, it will activate layer which shows vowels combined with starting sound. But after that, If you try to type shifted vowels so press shift key, it doesn't show correct letter. Second, Double consonants and diphthongs cannot be typed. Both requires typing consonant after consonant or vowel after vowel, but in current system, when you type vowel, keyboard now accepts only ending sounds. If you tried to type diphthong anyway, the composing letter is gone and only the last typed letter, which is vowel that has to be composed, is typed. (example situation. try to type "왜", typing "ㅇㅗㅐ" and result is only "ㅐ") Double consonants are only avallable on ending sound, but if you type only one ending sound, composing immediately ends and send letter, so consonants cannot be combined. (example situation. try to type "흙", typing "ㅎㅡㄹㄱ" and result is "흘" and ㄱ is waiting for vowels to be composed) Lastly, after trying this PR, modifier system might not suitable for korean typing. the main reason is starting and ending sounds are strictly distinguished. It results user ends composing letters by space bar when they don't type letter without ending sounds. (this isnt problem when we implement Sebeolsik layout, which strictly distinguishes starting and ending sounds, but Dubeolsik layout doesn't do that.)

During typing korean, Ending sound can turn into starting sounds, depending on what next letter is. I'll give two example. First is "안녕". the sequence of key typed would be "ㅇ, ㅏ, ㄴ, ㄴ, ㅕ, ㅇ". After typing first three letters, there are combined and results letter "안". composing is ended since ending sound was given. now type last three letters and results letter "녕". Final output is "안녕", as expected. Now, let's type "요즘". The sequence of key pressed is "ㅇ, ㅛ, ㅈ, ㅡ, ㅁ". (With proper korean ime) When you type "ㅇ, ㅛ, ㅈ", ㅈ is ending sounds because it was typed after vowel, results letter "욪". But even though ending sound was composed, composing doesn't end. Because next key(letter) can be vowel. If it is, ㅈ becomes starting sounds to be combined with vowel. In this case, Next letter is vowel ㅡ so ㅈ becomes starting sound. So typing "ㅇ, ㅛ, ㅈ" results "욪" but typing "ㅇ, ㅛ, ㅈ, ㅡ" results "요즈". As ㅈ becomes starting sound, the composing of first letter(요) finally ends (starting sounds cannot turn into ending sound.). and type last letter(ㅁ) so final output is "요즘". If 요즘 was typed with current PR, It will output "욪ㅡ", the last character ㅁ is waiting for next input since it can't be composed with only vowel. Also, Decomposing letter with backspace doesn't work. this means when you type 욪 and composing didn't end, pressing backspace won't remove whole letter, instead it removes ending sound first, next is vowel, and lastly removes starting sounds, literally decomposing.

(And, Thank you for making this great software and trying to implementing this feature, and sorry for asking you to implement this really difficult feature.)

WeepingClown13 commented 3 months ago

@Julow thanks a lot for the work on this. And thanks a bunch to @cbjx as well for the huge help in the process. I want to help, but my lack of knowledge in Java aside (which can ne compensated by putting in some effort), I have recently damaged my arm and can't help out sadly. I feel sorry that I can't help out, but it is great to see the effort being put in and I wanted you to see that it is much appreciated.

Julow commented 3 months ago

Thanks for testing.

First, Shifted vowels(ㅒ, ㅖ) cannot be combined.

Fixed now.

Double consonants are only avallable on ending sound, but if you type only one ending sound, composing immediately ends and send letter, so consonants cannot be combined

It seem that the shift layer will not be able to fit all the double consonants. The typing method you mention is a little hard to implement and I have not come up with a satisfactory implementation yet.

What do you think of adding all the double consonants and diphthong to the layouts, in the corner of the keys ? This way, you can type 흙 and 흘 in 3 key strokes (as you have both ㄹ and ㄺ available as one key stroke).

During typing korean, Ending sound can turn into starting sounds, depending on what next letter is.

It seems that we either need decomposing (decompose the previous character to recover the final consonant) or to remember the input sequence and do something like my first attempt (https://github.com/Julow/Unexpected-Keyboard/pull/594).

Also, Decomposing letter with backspace doesn't work.

Do you have an idea of how to implement decomposing, based on unicode ?

Julow commented 1 month ago

I've merged https://github.com/Julow/Unexpected-Keyboard/pull/595 as it improves the situation but some work is still needed. For example, there's no ㄺ on the keyboard currently.

Julow commented 1 month ago

abf36e5 adds the missing characters that are not accessible with the current composition algorithm.