Open Nick3C opened 15 years ago
Yeah, this is one of the growing class of bugs which would be fixed by a proper DB backend. Maybe I'll try and implement that this weekend.
I've almost implemented this, but it struck me that we need to be careful.
Imagine we had a few character or character sequence which were simplified to the same one (so it gets two entries in CEDICT with the same pronuncation). Imagine further that one of the two had a measure word. Then when we lookup that character in HanDeDict, how can we tell that it's /really/ legitimate to apply that measure word to that use? After all we might have either of the two possible traditional characters in our hand!
It's a bit obscure but a potential issue.
No, you are right, we need to be careful of this. I think that the way to solve it is to match not only the character but also the pinyin. i.e. only fil the MW in german if both the pinyin and character match.
The reason I say this is that if they both match then what we really have is one character with multiple meanings, not 2 characters.
It may lead to failure to fill in German where there are mistakes in either of the dictionaries but it should work most of the time in a 'good enough' way without every filing wrong information.
ahhh, it is also possible if trad and simp are different. There could be two entries where the simplified characters are the same and pinyin is the same by many-to-one simplification means the traditional differs.
So we also need to check trad, simp, and pinyin before treating as equivalent.
I think I have mentioned this in emails but not yet in a bug tracker.
I guess it is not worth fixing until we add mysql.