PSeitz / wana_kana_rust

Utility library for checking and converting between Japanese characters - Hiragana, Katakana - and Romaji
MIT License
70 stars 14 forks source link

panicked to_romaji "ウーッー" #13

Closed kounoike closed 1 month ago

kounoike commented 1 year ago

Thanks for provide useful library! I'm using this library with meilisearch.

❯ cargo install wana_kana
    Updating crates.io index
  Installing wana_kana v3.0.0
   Compiling either v1.8.1
   Compiling lazy_static v1.4.0
   Compiling fnv v1.0.7
   Compiling itertools v0.10.5
   Compiling wana_kana v3.0.0
    Finished release [optimized] target(s) in 2.97s
  Installing /home/kounoike/.cargo/bin/to_kana
  Installing /home/kounoike/.cargo/bin/to_romaji
   Installed package `wana_kana v3.0.0` (executables `to_kana`, `to_romaji`)

~ took 3s
❯ to_romaji "ウーッー"
thread 'main' panicked at 'could not find kana 'っ' in TO_ROMAJI map', /home/kounoike/.cargo/registry/src/github.com-1ecc6299db9ec823/wana_kana-3.0.0/src/utils/katakana_to_hiragana.rs:68:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

my requests are:

  1. support "ウーッー"
  2. shouldn't panic unexpected Japanse-sequence

first, maybe convert to "uu" or "uuu".

second is more important. I found this problem when using meilisearch indexing. When this issue occurs, indexing thread is panicked. then, updating document and index are lost.

Japanese notation has many variations and new ones can be created. For example, くぁwせdrftgyふじこlp is valid (slang) Japanese notation.