Open Heliozoa opened 10 months ago
Are you suggesting there should be a separate method for this?
This crate is mostly for hiragana and katakana, is_kanji
is just for convenience.
That's understandable, covering just the CJK Unified Ideographs block is enough for most purposes I imagine.
For added context, I was using the crate for its other functionality already, and started using is_kanji
to pick out kanji from the words contained in the JMdict dictionary. It contains some words that contain kanji from the extension blocks, so they were unexpectedly (to me) getting filtered out by is_kanji
.
You can close the issue if this is out of scope, or leave it up if this is something that may have a place in the crate in the future. Thanks for the quick response!
Currently,
is_kanji
uses the Unicode range U+4E00-U+9FAF to recognise kanji, corresponding to the CJK Unified Ideographs block. Unicode has additional "extension blocks" that contain more uncommon kanji, such as CJK Unified Ideographs Extension B which contains the kanji 𬵪.Since these are quite obscure and possibly difficult to determine which of them qualify as "kanji", I think it would be useful to include such functionality in a crate.