bragefuglseth / keypunch

Practice your typing skills
GNU General Public License v3.0
85 stars 13 forks source link

[Language Request]: Korean #8

Closed developomp closed 2 weeks ago

developomp commented 1 month ago

English Name

Korean

Native Name

한국어

Syntax

there are 24 letters in the Korean language. 14 consonants: ㄱㄴㄷㄹㅁㅂㅅㅇㅈㅊㅋㅌㅍㅎ 10 vowels: ㅏㅑㅓㅕㅗㅛㅜㅠㅡㅣ

some consonants can combine to form double consonants. Here's an exhaustive list of all double consonants. Base Double

some vowels can combine to form composite vowels. e.g. ㅓ + ㅣ = ㅔ ㅕ + ㅣ = ㅖ ㅏ + ㅣ = ㅐ ㅑ + ㅣ = ㅒ

At least one consonant and one vowel combine to form a syllable. image source: https://abcdefridays.blogspot.com/2016/04/korean-alphabet-and-typography.html

e.g. consonant ㄴ (n) and vowel ㅏ (a) combine to become syllable 나 (na) consonant ㅂ (b) and vowel ㅣ (i) combine to become syllable 비 (bi) syllable 나 (na) and 비 (bi) combine to form word 나비 (nabi, butterfly)

the position of the letters are automatically determined by the computer depending on the input order. e.g. ㄱ + ㅏ = 가 ㄱ + ㅏ + ㅁ = 감 ㅁ + ㅏ + ㄱ = 막 ㄱ + ㅜ + ㅔ = 궤 ㄷ + ㅏ + ㄹ + ㄱ = 닭

image

The process of typing 나비 is as follows:

  1. ㄴ (type ㄴ)
  2. 나 (type ㅏ)
  3. 납 (type ㅂ)
  4. 나비 (type ㅣ)

https://github.com/bragefuglseth/keypunch/assets/40858122/a16ddc81-67ce-4e38-8f56-425a1fa3871f

Since the syllable 나 is not saved as ㄴ + ㅏ in utf-8, and it is its own separate character, we need special logic to prevent keypunch from marking a syllable as invalid as we're typing them out.

I suggest disassembling each character (a korean syllable) to its letter components then comparing the resulting array of letters. Here is a typescript implementation of the said logic.

Implementation Assistance

Additional Information

Technically, the name of the Korean language is Hangul (한글). However, it is more common for it to be called 한국어 (which literally translates to “Korean Language”) in a computer/internet context, so that's what I'm going with.

bragefuglseth commented 1 month ago

Hi, and thanks for all the information! I think I'll be able to spin up Korean text generation pretty easily now, but first I'd like to make sure that Hangul input works nicely. When implementing Arabic support for the text widgetry, I added a way to indicate when a character has been partially typed (it's marked with an underline), so I just have to flesh that out a little more so it plays along with the underlying Hangul input system as well. When I have done that, I'll add Korean to the text generation infrastructure and notify you.

bragefuglseth commented 3 weeks ago

Hi, the Korean implementation is ready for testing! If you build the app from the main branch, it should be available in the Text Language dialog. You didn't say anything specific about Korean punctuation conventions, so for now it just uses the same rules as English. Let me know if anything should be changed.

developomp commented 3 weeks ago

For some reason which I've yet to investigate, the application crashes as soon as I type anything in Korean (using ibus-hangul). This is regardless of the keypunch language settings. It's 1 AM here so I'll go to bed for now.

Logs:

Application started at 12:59:16 AM
thread 'main' panicked at src/widgets/text_view/input.rs:105:25:
already borrowed: BorrowMutError
stack backtrace:
   0: rust_begin_unwind
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/std/src/panicking.rs:652:5
   1: core::panicking::panic_fmt
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/panicking.rs:72:14
   2: core::cell::panic_already_borrowed
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/cell.rs:770:5
   3: core::cell::RefCell<T>::borrow_mut
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/cell.rs:1061:25
   4: keypunch::widgets::text_view::input::<impl keypunch::widgets::text_view::imp::KpTextView>::push_typed_text
             at ./Dev/projects/keypunch/src/widgets/text_view/input.rs:105:9
   5: keypunch::widgets::text_view::input::<impl keypunch::widgets::text_view::imp::KpTextView>::setup_input_handling::{{closure}}
             at ./Dev/projects/keypunch/src/widgets/text_view/input.rs:19:17
   6: gtk4::auto::im_context::IMContextExt::connect_commit::commit_trampoline
             at ./.var/app/org.gnome.Builder/cache/gnome-builder/projects/keypunch/builds/dev.bragefuglseth.Keypunch.Devel.json-flatpak-org.gnome.Platform-46-x86_64-main/cargo-home/registry/src/index.crates.io-6f17d22bba15001f/gtk4-0.8.2/src/auto/im_context.rs:270:13
   7: g_cclosure_marshal_VOID__STRINGv
   8: <unknown>
   9: <unknown>
  10: g_signal_emit_by_name
  11: g_cclosure_marshal_VOID__STRINGv
  12: <unknown>
  13: <unknown>
  14: g_signal_emit_valist
  15: g_signal_emit
  16: <unknown>
  17: <unknown>
  18: gtk4::auto::im_context::IMContextExt::reset
             at ./.var/app/org.gnome.Builder/cache/gnome-builder/projects/keypunch/builds/dev.bragefuglseth.Keypunch.Devel.json-flatpak-org.gnome.Platform-46-x86_64-main/cargo-home/registry/src/index.crates.io-6f17d22bba15001f/gtk4-0.8.2/src/auto/im_context.rs:177:13
  19: keypunch::widgets::text_view::caret::<impl keypunch::widgets::text_view::imp::KpTextView>::update_caret_position
             at ./Dev/projects/keypunch/src/widgets/text_view/caret.rs:137:17
  20: keypunch::widgets::text_view::imp::KpTextView::typed_text_changed
             at ./Dev/projects/keypunch/src/widgets/text_view.rs:160:13
  21: keypunch::widgets::text_view::input::<impl keypunch::widgets::text_view::imp::KpTextView>::setup_input_handling::{{closure}}
             at ./Dev/projects/keypunch/src/widgets/text_view/input.rs:36:17
  22: gtk4::auto::im_context::IMContextExt::connect_preedit_changed::preedit_changed_trampoline
             at ./.var/app/org.gnome.Builder/cache/gnome-builder/projects/keypunch/builds/dev.bragefuglseth.Keypunch.Devel.json-flatpak-org.gnome.Platform-46-x86_64-main/cargo-home/registry/src/index.crates.io-6f17d22bba15001f/gtk4-0.8.2/src/auto/im_context.rs:330:13
  23: <unknown>
  24: <unknown>
  25: g_signal_emit_by_name
  26: <unknown>
  27: <unknown>
  28: g_signal_emit_valist
  29: g_signal_emit
  30: g_closure_invoke
  31: <unknown>
  32: <unknown>
  33: g_signal_emit_valist
  34: g_signal_emit
  35: <unknown>
  36: g_closure_invoke
  37: <unknown>
  38: <unknown>
  39: g_signal_emit_valist
  40: g_signal_emit
  41: <unknown>
  42: <unknown>
  43: <unknown>
  44: <unknown>
  45: g_main_context_iteration
  46: <unknown>
  47: <unknown>
  48: <unknown>
  49: <unknown>
  50: <unknown>
  51: <unknown>
  52: <unknown>
  53: <unknown>
  54: <unknown>
  55: g_closure_invoke
  56: <unknown>
  57: <unknown>
  58: g_signal_emit_valist
  59: g_signal_emit
  60: <unknown>
  61: <unknown>
  62: <unknown>
  63: <unknown>
  64: g_main_context_iteration
  65: g_application_run
  66: gio::application::ApplicationExtManual::run_with_args
             at ./.var/app/org.gnome.Builder/cache/gnome-builder/projects/keypunch/builds/dev.bragefuglseth.Keypunch.Devel.json-flatpak-org.gnome.Platform-46-x86_64-main/cargo-home/registry/src/index.crates.io-6f17d22bba15001f/gio-0.19.5/src/application.rs:29:13
  67: gio::application::ApplicationExtManual::run
             at ./.var/app/org.gnome.Builder/cache/gnome-builder/projects/keypunch/builds/dev.bragefuglseth.Keypunch.Devel.json-flatpak-org.gnome.Platform-46-x86_64-main/cargo-home/registry/src/index.crates.io-6f17d22bba15001f/gio-0.19.5/src/application.rs:22:9
  68: keypunch::main
             at ./Dev/projects/keypunch/src/main.rs:58:5
  69: core::ops::function::FnOnce::call_once
             at /rustc/129f3b9964af4d4a709d1383930ade12dfe7c081/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Application exited
bragefuglseth commented 3 weeks ago

That's weird. Could you provide a sample text so I can try to reproduce it? Also, are you sure this happens with the latest development version?

bragefuglseth commented 3 weeks ago

From analyzing the stack trace you provided, I think I've managed to identify and fix the issue. It was an unusually tough one! Could you try again now?

Seems like we're not using the exact same input method setup, by the way. Have you configured your hangul-ibus in any way, or are you using the default preferences? Due to the way things in the app are set up now, you may or may not encounter problems with inserting the correct text even if it doesn't crash…

developomp commented 3 weeks ago

Tested 1c7b49727d5b985e8ce779aa8d2a50573990a719 with the following ibus-hangul settings.

image

Keypunch does not crash anymore, but pressing a key marks the syllable as incorrect and moves on the the next one.

e.g. when typing out 나비, pressing ㄴ marks 나 as incorrectly typed. Pressing ㅏ then marks 비 incorrect and so on.

bragefuglseth commented 3 weeks ago

That was actually what I expected to happen, and I think I know why now. Keypunch apparently misuses the IMContext.reset() function a little, so I'll have to reduce its usage to where it's appropriate :)

bragefuglseth commented 3 weeks ago

Alright, try now. If it was enough to limit the use of IMContext.reset(), it should work. I suspect we may need to to invoke IMContext.set_cursor_location() less too.

developomp commented 3 weeks ago

Tested commit defec44b157a74a57962205927694e7839255f71

It appears that some of the keystrokes are not picked up by keypunch at all. In the video I'm typing "새 없다 대부분 힘들다 적 여기 비하다" but that's not at all what the app sees.

untitled.webm

bragefuglseth commented 3 weeks ago

It appears that some of the keystrokes are not picked up by keypunch at all.

It looks like that, but Keypunch should actually be well aware of what's going on. While you're "working on" a letter, ibus-hangul sends the current state of the letter to a special buffer called the "preedit string", and Keypunch listens to this as well to indicate that the letter is partially typed. This just isn't included in self.typed_text, even though typed_text_changed() is run every time the preedit string is changed as well.

I tried to reconstruct the first part of your example video myself, and weirdly enough it appears to work perfectly for me:

Skjermopptak fra 2024-06-23 18-49-28.webm

bragefuglseth commented 3 weeks ago

From studying the recording, it seems like the "swallowed" keystrokes are ones right after the completion of the previous letter. Does this only happen in Keypunch?

bragefuglseth commented 3 weeks ago

If you replace your debug println statement with this, you'll get a "log" of the state at each keystroke including the preedit string:

println!("{}/ {}", self.typed_text.borrow(), self.input_context.borrow().as_ref().unwrap().preedit_string().0);

Here's my log for the string 새 없다 대부분:

/ 
/ ㅅ
/ 새
새/ 새
새/ 
새 / 
새 / ㅇ
새 / 어
새 / 업
새 / 없
새 없/ 없
새 없/ ㄷ
새 없/ 다
새 없다/ 다
새 없다/ 
새 없다 / 
새 없다 / ㄷ
새 없다 / 대
새 없다 / 댑
새 없다 대/ 댑
새 없다 대/ 부
새 없다 대/ 붑
새 없다 대부/ 붑
새 없다 대부/ 부
새 없다 대부/ 분
새 없다 대부분/ 분

If you do the same, we can compare them.

bragefuglseth commented 3 weeks ago

That's the final potential solution I can come up with right off the top of my head

developomp commented 3 weeks ago

Commit: bdca1a7a572990d9e8ad083efe34267920b6c3f0

From studying the recording, it seems like the "swallowed" keystrokes are ones right after the completion of the previous letter. Does this only happen in Keypunch?

Yes.

Here's a video with better view of my ibus-hangul config.

untitled.webm

bragefuglseth commented 3 weeks ago

Seems like we're using the exact same input method configuration. It's weird that it behaves differently. Maybe there is an open issue for it in ibus-hangul's issue tracker? I assume this is Keypunch's fault, though.

bragefuglseth commented 3 weeks ago

The problem might actually be that the reset function is called on the IM at all. Reading through ibus-hangul's source code and issues, it looks like its current approach is pretty fragile and involves some data-races/timing issues. Let's see if this works anyways.

bragefuglseth commented 3 weeks ago

I've dropped all kinds of IM context resetting on the main branch now, and cross my fingers hoping that it works!

bragefuglseth commented 3 weeks ago

Yeeeah, pretty sure I've found the culprit. TL;DR: ibus-hangul sometimes commits text while keaving the beginning of the next letter in the preedit buffer (which is reasonable), but then Keypunch called reset() on ibus-hangul, essentially telling it to remove the preedit text. This is super specific and probably only applies to Korean, but I'm glad we could have the misuse resolved.

Skjermbilde fra 2024-06-24 19-40-36

Where it should be reset is when the session ends, so there aren't any leftovers in the preedit buffer before next round starts. I've fixed that too.

Assuming that this works, all that’s left now is fixing WPM calculation for Korean (the currently reported values are probably 50-60% too low). Shouldn’t be too hard, as I just need to pull in hangeul-rs and decompose any Hangul characters before counting the typed letters.

developomp commented 3 weeks ago

I'm away from home without my laptop. I'll test the code this friday as soon as I get back home.

Thank you for your effort despite the delay and seemingly endless roadbumps. Seriously, I have no idea how you manage to code for a language you've never spoken before.

Do you have paypal or a page at buymeacoffee.com? I'd like to buy you a coffee.

developomp commented 2 weeks ago

The code's working properly now :+1:

The only thing I can nitpick is that I can't see my mistakes, but that's a whole new topic. This issue can be closed now. :tada:

bragefuglseth commented 2 weeks ago

The code's working properly now 👍

Great to hear! If you’d like to, you can provide a name (and optionally a website), and I’ll credit you in the Orthography section of the about window.

Seriously, I have no idea how you manage to code for a language you've never spoken before.

Most of the heavy lifting is done by the ibus engines of the relevant languages, since they interpret keypresses and decide when letters should be committed to the actual buffer and what should be kept in the preedit string. Keypunch just needs to communicate properly with the engine to get/set the correct information, but since I’m new to how ibus works internally, there have been a few hurdles on the way there 😂

Do you have paypal or a page at buymeacoffee.com? I'd like to buy you a coffee.

Really appreciate the offer! I’m in a privileged economical situation right now, though, and it would feel wrong to accept money through donations/tips. Save it for somebody else whose work you appreciate 😇

This was the last roadblock I wanted to get past before doing a new release, I’ll prepare one in the coming days.

developomp commented 2 weeks ago

If you’d like to, you can provide a name (and optionally a website), and I’ll credit you in the Orthography section of the about window.

Name: Kim, Jimin (aka developomp) Website: https://developomp.com