Closed GoogleCodeExporter closed 9 years ago
The version of librime in the report is 1.1. (Sorry for forgetting it.)
Original comment by culu....@gmail.com
on 14 May 2014 at 4:42
before step 2, the current dir should be changed:
% cd build/bin
Original comment by culu....@gmail.com
on 14 May 2014 at 5:39
I do not agree that 给 is a useful word, since Cangjie is generally conceived
as an IME for traditonal Chinese. On the other hand, 組合 is not a partial
match, but a full matching phrase in the form of AABBB, as you wouldn't say:
十十人一弓 for 輸 is a partial match, the complete code being
十田十人一月中弓.
You should have noticed that not only word frenquency, but also the phrases are
in traditional Chinese. You have to create a different dictionary for
simplified Chinese, which is not provided by the package.
Original comment by chen....@gmail.com
on 14 May 2014 at 7:28
> I do not agree that 给 is a useful word, since Cangjie is generally
> conceived as an IME for traditonal Chinese. On the other hand, 組合
> is not a partial match, but a full matching phrase in the form of
> AABBB, as you wouldn't say: 十十人一弓 for 輸 is a partial match,
> the complete code being 十田十人一月中弓.
Thank you for this explanation. However, I don't find it quite
convincing. Altough starting out as a traditional Chinese IME,
Cangjie5 incorported simplified charaters later, and can be used
as a simplified Chinese IME. [1][1.zh] (Your notion that Cangjie5
is more likely to be related to traditional Chinese, is reasonable.)
> You should have noticed that not only word frenquency, but
> also the phrases are in traditional Chinese. You have to
> create a different dictionary for simplified Chinese, which
> is not provided by the package.
Now I understand that this is not a bug in librime, but it
is one in brise, or preset/cangjie5.dict.yaml, to be specific.
preset/cangjie5.dict.yaml:
19 ---
20 name: "cangjie5"
21 version: "0.18"
22 sort: by_weight
23 use_preset_vocabulary: true
24 max_phrase_length: 7
25 min_phrase_weight: 100
26 columns:
27 - text
28 - code
29 - stem
30 encoder:
31 exclude_patterns:
32 - '^x.*$'
33 - '^z.*$'
34 rules:
35 - length_equal: 2
36 formula: "AaAzBaBbBz"
37 - length_equal: 3
38 formula: "AaAzBaBzCz"
39 - length_in_range: [4, 10]
40 formula: "AaBzCaYzZz"
I've tried but failed to find any evidence that could
support the phrase rule of AABBB. I've searched through the
tutorial on chinesecj.com[2], the cited Cangjie5 manual[3],
and an updated version of this manual[4][4.web], cited by
wikipedia[1.zh]. In addition, a friend from Hong Kong told
me that she would normally type the word "組合" separately
by "vfbm", and then "omr". Also, the result of typing in
"vmomr" on her computer is "给".
Therefore, I suspect this is a bug in brise:preset/cangjie5.dict.yaml,
the encoder/rules part, line 34 to 40.
[1] http://en.wikipedia.org/wiki/Cangjie_input_method
[1.zh]
http://zh.wikipedia.org/wiki/%E5%80%89%E9%A0%A1%E8%BC%B8%E5%85%A5%E6%B3%95
[2] http://chinesecj.com/newlearncj/
[3] http://www.cbflabs.com/down/show.php?id=28
[4] http://www.cbflabs.com/down/show.php?id=299
[4.web] http://www.cbflabs.com/book/ocj5/ocj5/index.html
Original comment by culu....@gmail.com
on 14 May 2014 at 5:32
It's a non-standard feature, not a bug.
See also:
http://www.chinesecj.com/forum/forum.php?mod=viewthread&tid=634
http://tieba.baidu.com/p/1028390846
You can disable phrases by removing the 'encoder' part.
Rime is designed to be highly configurable.
The preset schemata will definitely not satisfy every one, but they well
illustrate most features the framework provides. Feel free to create your own
schema.
Original comment by chen....@gmail.com
on 15 May 2014 at 2:24
candidates
illustrate most features the framework provides. Feel free to create your
own schema.
Thank you. I created a modified version of cangjie5.dict.yaml in my user
directory, under a different filename, and added a translator/dictionary
entry in cangjie5.custom.yaml. It works. I just don't understand why just
changing translator/enable_encoder to false doesn't work. (Perhaps it's a
different feature under a similar name.)
Also, I don't see the reason for using a non-standard extension as default,
although it demonstrates the strength of the "encoder" feature quite well.
Is it that high configurability renders default values no longer important?
Original comment by culu....@gmail.com
on 16 May 2014 at 2:59
(Sorry, the first three lines weren't quoted properly through email.)
Original comment by culu....@gmail.com
on 16 May 2014 at 3:02
[deleted comment]
'translator/enable_encoder: true' will enable the table_translator to make new
phrases dynamically based on user input. The pre-installed phrases are
introduced into the dictionary by setting 'import_preset_vocabulary: true' in
cangjie5.dict.yaml.
In the context of traditional Chinese, phrases rank lower than frequently used
characters thus they hardly break anything, and those who use phrases benefit
from less typing and speed gain.
Original comment by chen....@gmail.com
on 16 May 2014 at 8:14
Original issue reported on code.google.com by
culu....@gmail.com
on 14 May 2014 at 4:39