chewing / chewing-editor

Cross platform chewing user phrase editor
https://chewing.im/
GNU General Public License v2.0
30 stars 52 forks source link

Add new phrase cut into multiple phrases #207

Open qas612820704 opened 7 years ago

qas612820704 commented 7 years ago

Like #98.

Adding new phrase will cut into more than 1 phrase, and also contains bopomofo.

ie. When I add this

Phrase 歐你媽個頭
Bopomofo ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ

It will split into multiple phrase, like the right part of below figure.

screenshot from 2017-04-09 15-16-37

More addition, the new phrase contains bopomofo.

Is this the correct behavior or something got wrong?

jserv commented 7 years ago

@qas612820704 , use chewing-editor -d to dump and analyze the log. Always attach text messages.

qas612820704 commented 7 years ago

@jserv

Debug: Add "歐你媽個頭" ( "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ" ) ((null) :0)
Debug: [chewingio.c:1996 chewing_userphrase_add] API call:  ((null) :0)
Warning: chewing_userphrase_add() returns 0 ((null) :0)
Debug: [chewingio.c:1859 chewing_userphrase_enumerate] API call:  ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: 歐 ㄡ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: 你媽 ㄋㄧˇ ㄇㄚ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: 個頭 ㄍㄜ˙ ㄊㄡˊ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄡ ㄡ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄋ一ˇ ㄋ ㄧ ˇ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄚ ㄚ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄇㄚ ㄇ ㄚ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄜ ㄜ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄍㄜ˙ ㄍ ㄜ ˙ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: [chewingio.c:1936 chewing_userphrase_get] API call:  ((null) :0)
Debug: Get userphrase: ㄊㄡˊ ㄊ ㄡ ˊ ((null) :0)
Debug: [chewingio.c:1887 chewing_userphrase_has_next] API call:  ((null) :0)
Debug: Total userphrase 10 ((null) :0)
Debug: 10 ((null) :0)

It looks like same issue of #206.

I will trace the source code.

samwhelp commented 7 years ago

會發生這個例外,應該是您把「注音」的「ㄧ」,輸成「中文單字」的「一」。

您可以再確認一下,上面的「ㄋ一ˇ」,是「中文單字」的「一」。

注音: ㄧ U+3127 http://www.fileformat.info/info/unicode/char/3127/index.htm

單字: 一 U+4e00 http://www.fileformat.info/info/unicode/char/4e00/index.htm

以上提供參考

:-)

david50407 commented 7 years ago

@samwhelp The issue you mentioned should be solved after PR #169 which replaced all into (also replaced all into )

david50407 commented 7 years ago

oops, @samwhelp you're right, that U+4e00(一) was mis-typed in bopomofo and #169 didn't catch.

I re-checked #169, that catch the wrong word while replacing U+3127 to U+3127 (yes, the same word).

And this issue should be related to #108.

samwhelp commented 7 years ago

補充一下,我測試的環境

執行

$ dpkg -l '*chewing*'

顯示

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                            Version              Architecture         Description
+++-===============================-====================-====================-====================================================================
ii  chewing-editor                  0.0.1-3              amd64                user dictionary editor for the chewing input method
ii  fcitx-chewing                   0.2.2-1              amd64                Fcitx wrapper for Chewing library
ii  hime-chewing:amd64              0.9.10+git20150916+d amd64                support library to use Chewing in HIME
un  libchewing                      <none>               <none>               (no description available)
un  libchewing-data                 <none>               <none>               (no description available)
un  libchewing-dev                  <none>               <none>               (no description available)
un  libchewing1-dev                 <none>               <none>               (no description available)
un  libchewing2-dev                 <none>               <none>               (no description available)
ii  libchewing3:amd64               0.4.0-4              amd64                intelligent phonetic input method library
ii  libchewing3-data                0.4.0-4              all                  intelligent phonetic input method library - data files
ii  libchewing3-dev                 0.4.0-4              amd64                intelligent phonetic input method library (developer version)
un  scim-chewing                    <none>               <none>               (no description available)

我使用「chewing-editor -d」來測試,

輸入

phrase = "歐你媽個頭"
bopomofo = "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

得到下面的結果

Debug: Add "歐你媽個頭" ( "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ" ) ((null) :0)
Debug: [chewingio.c:1998 chewing_userphrase_add] API call:  ((null) :0)
Warning: chewing_userphrase_add() returns 0 ((null) :0)

輸入

phrase = "歐你媽個頭"
cbopomofo = "ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

得到下面的結果

Debug: Add "歐你媽個頭" ( "ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ" ) ((null) :0)
Debug: [chewingio.c:1998 chewing_userphrase_add] API call:  ((null) :0)
Debug: [userphrase-sql.c:179 LogUserPhrase] userphrase 歐你媽個頭, phone = 0x0040 0x0e83 0x0608 0x1219 0x0c42 , orig_freq = 1, max_freq = 1, user_freq = 1, recent_time = 58958 ((null) :0)
Debug: "歐你媽個頭 (ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ)" ((null) :0)

然後下載「chewing-editor」的「Source Package」來觀看,

$ apt-get source chewing-editor

執行

$ grep 'checkBopomofo' chewing-editor-0.0.1/* -R

沒有顯示

執行

$ grep 'UserphraseModel::add' chewing-editor-0.0.1/* -R -A 18

顯示

chewing-editor-0.0.1/src/model/UserphraseModel.cpp:void UserphraseModel::add(std::shared_ptr<QString> phrase, std::shared_ptr<QString> bopomofo)
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    add(*phrase.get(), *bopomofo.get());
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-}
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-void UserphraseModel::importUserphrase(std::shared_ptr<UserphraseImporter> importer)
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    size_t old_count = userphrase_.size();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    if (!importer.get()->isSupportedFormat()) {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit importCompleted(false, importer.get()->getPath(), 0, old_count);
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        return;
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    }
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    auto result = importer.get()->getUserphraseSet();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    for (auto& i: result) {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        add(i.phrase_, i.bopomofo_);
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    }
--
chewing-editor-0.0.1/src/model/UserphraseModel.cpp:void UserphraseModel::add(const QString &phrase, const QString &bopomofo)
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    auto ret = chewing_userphrase_add(
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        ctx_.get(),
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        phrase.toUtf8().constData(),
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        bopomofo.toUtf8().constData());
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    if (ret > 0) {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit beginResetModel();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        userphrase_.insert(Userphrase{
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-            phrase,
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-            bopomofo
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        });
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit endResetModel();
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        emit addNewPhraseCompleted(userphrase_[userphrase_.size()-1]);
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    } else {
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-        qWarning() << "chewing_userphrase_add() returns" << ret;
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-    }
chewing-editor-0.0.1/src/model/UserphraseModel.cpp-}

看起來我目前使用「chewing-editor」的這個版本「0.0.1-3」,應該是還沒有修正前的版本。

然後我也有測試「libchewing3」,結果也是相同的,

輸入

phrase = "歐你媽個頭"
bopomofo = "ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

執行「chewing_userphrase_add」會回傳「0」。

輸入

phrase = "歐你媽個頭"
cbopomofo = "ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ"

執行「chewing_userphrase_add」會回傳「1」。

關於 #206 我有測過,應該也是同樣的情形。

單字

phrase = "鞭數十"
bopomofo = "ㄅ一ㄢ ㄕㄨˋ ㄕˊ"

注音

phrase = "鞭數十"
bopomofo = "ㄅㄧㄢ ㄕㄨˋ ㄕˊ";

報告完畢

:-)

david50407 commented 7 years ago

After #210, this issue should be solved now, @qas612820704 can you try again for this issue?

And thanks for the help, @samwhelp, the auto-conversion is published after 0.1.1.

BTW, we still need a good solution to #108.

qas612820704 commented 7 years ago

Hi @david50407 , @samwhelp is right. I typos 一 as ㄧ.

Changing

phase = 歐你媽個頭
bopomofo = ㄡ ㄋ一ˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ

into

phase = 歐你媽個頭
bopomofo = ㄡ ㄋㄧˇ ㄇㄚ ㄍㄜ˙ ㄊㄡˊ

works fine. Thx.

qas612820704 commented 7 years ago

@david50407, and that right, #169 Two "一" are U+3127 at

// src/model/UserphraseModel.cpp:197
QString UserphraseModel::checkBopomofo(const QString &bopomofo) const
{
    ...
    replaceBopomofo.replace(QString::fromUtf8("ㄧ"),QString::fromUtf8("ㄧ"));
    ...
}

needs change to

// src/model/UserphraseModel.cpp:197
QString UserphraseModel::checkBopomofo(const QString &bopomofo) const
{
    ...
    replaceBopomofo.replace(QString::fromUtf8("一"),QString::fromUtf8("ㄧ"));
    ...
}

Change the first "ㄧ"(U+3127) into "一"(U+4E00)

Should I make a pull request to fix it?

jserv commented 7 years ago

@qas612820704, The idea of your preliminary work is to implement fuzzy match logic, which is worthy for sending pull request(s). Can you improve it by accepting more characters such as ?

qas612820704 commented 7 years ago

@jserv, is there another characters like ? I just know Y in English, and already fixed in #169. I have no idea with others bopomofo-like characters.

david50407 commented 7 years ago

@qas612820704 @jserv, I already fixed that at #210 (and merged) yesterday, and I don't think english charecter Y is that easily to be mistaken here.

and look same as and in the IME input box under some fonts, so I think just take these two cases is fine.

jserv commented 7 years ago

I defer to @david50407 for the idea not to take alphabet Y into consideration.