batterseapower / pinyin-toolkit

A plugin for the Anki Spaced Repetition System (http://ichi2.net/anki/)
http://batterseapower.github.com/pinyin-toolkit/
39 stars 14 forks source link

Add Better Traditional Conversion #126

Open Nick3C opened 15 years ago

Nick3C commented 15 years ago

Currently handled by gTrans but needs to have a dictionary-based conversion added for increased reliability.

Better wait for mySQL.

Nick3C commented 15 years ago

Ok, I have found our first traditional simplified bug.

亁 亁 [qian2] /variant of 乾/surname Qian/strong/one of the Eight Trigrams 八卦 representing sky/male principle/ 乾 干 [gan1] /dry/clean/surname Gan/ 乾 乾 [qian2] /surname Qian/strong/one of the Eight Trigrams 八卦 representing sky/male principle/ 幹 干 [gan4] /tree trunk/main part of sth/to manage/to work/to do/capable/cadre (in communist party)/to kill (slang)/to fuck (taboo word)/ 干 干 [gan1] /to concern/to interfere/shield/stem/ 干 干 [gan4] /to work/to do/to manage/

This is slightly interesting because there is a many-to-one relationship but also the 乾 can be both traditional and simplified with different meaning. The real problem is that the 乾 will always be over-written even though it is a valid simplified character.

I'm not too sure what the best way to deal with this is. Possibly an exclusion list, but that might cause it's own problems and we will have to maintain it.

Nick3C commented 15 years ago

Another problem.Using 占 is fine, but searching for 佔 causes problems.

佔 is simplified to 占 which is fine, but then the 佔 is stripped from meaning if the input was traditional.

Nick3C commented 15 years ago

That 'gan' error may be the only one you know: http://www.geocities.com/tokyo/pagoda/3847/hanzi/t-s-prob.htm

That would be extremely good :)

Nick3C commented 15 years ago

or rather only those listed there.