Cangjians / ibus-cangjie

An IBus engine for users of the Cangjie and Quick input methods
30 stars 14 forks source link

[Feature request] Add vocabulary prediction #95

Open max-hk opened 5 years ago

max-hk commented 5 years ago

It would be better if ibus-cangjie could predict the next/next few words while users are typing.

There are many free Chinese vocabulary list in the Web, licensed in CC-BY-SA or BSD. You can find them in the link below. https://chromium.googlesource.com/chromium/deps/icu46/+/e49b610806e6ba6063384ffd7f45d5b7cd561e65/source/data/brkitr/README.chromium

You can also use the pre-built by the chromium team, which combine all lists in the above link and licensed under a MIT-like LICENSE. https://chromium.googlesource.com/chromium/deps/icu46/+/e49b610806e6ba6063384ffd7f45d5b7cd561e65/source/data/brkitr/cjdict.txt ...or a updated version of the combined list by Unicode https://github.com/unicode-org/icu/blob/master/icu4c/source/data/brkitr/dictionaries/cjdict.txt

Android Pinyin IME repo also contains a vocabulary list (simplified Chinese only) https://android.googlesource.com/platform/packages/inputmethods/PinyinIME/+/refs/heads/master/jni/data/rawdict_utf16_65105_freq.txt

bochecha commented 5 years ago

Thanks for the issue.

However this is already tracked as #4 so this would have been better as a comment there.

However, since your comment provides more data, I'm going to close the other one and keep this one. :wink:

max-hk commented 5 years ago

@bochecha Thanks

mbridon commented 1 year ago

Hi @max-hk, sorry for never giving any news. This is a very interesting feature we've always wanted !

However, due to unforeseen health issues I haven't been able to give this any thought for about 3 years... :sob:

I'm trying to get back to this slowly though :smile:

What would definitely help me however would be either the data from Chromium in source form so I can make use of it.

The license they use seems to be CC-BY-SA, is that correct? If it is, I think (but I'm not a lawyer and this is not legal advice) it should be compatible with using it in ibus-cangjie, but probably only if we get them from sources instead of the binary form (so we can make some modifications and share them back with Chromium of course, as allows and requires the CC-BY-SA).

So rest assured you helped a lot with finding this and we totally want to make good use of it :grin: