Open Nick3C opened 15 years ago
Ok, I have found our first traditional simplified bug.
亁 亁 [qian2] /variant of 乾/surname Qian/strong/one of the Eight Trigrams 八卦 representing sky/male principle/ 乾 干 [gan1] /dry/clean/surname Gan/ 乾 乾 [qian2] /surname Qian/strong/one of the Eight Trigrams 八卦 representing sky/male principle/ 幹 干 [gan4] /tree trunk/main part of sth/to manage/to work/to do/capable/cadre (in communist party)/to kill (slang)/to fuck (taboo word)/ 干 干 [gan1] /to concern/to interfere/shield/stem/ 干 干 [gan4] /to work/to do/to manage/
This is slightly interesting because there is a many-to-one relationship but also the 乾 can be both traditional and simplified with different meaning. The real problem is that the 乾 will always be over-written even though it is a valid simplified character.
I'm not too sure what the best way to deal with this is. Possibly an exclusion list, but that might cause it's own problems and we will have to maintain it.
Another problem.Using 占 is fine, but searching for 佔 causes problems.
佔 is simplified to 占 which is fine, but then the 佔 is stripped from meaning if the input was traditional.
That 'gan' error may be the only one you know: http://www.geocities.com/tokyo/pagoda/3847/hanzi/t-s-prob.htm
That would be extremely good :)
or rather only those listed there.
Currently handled by gTrans but needs to have a dictionary-based conversion added for increased reliability.
Better wait for mySQL.