MWs Only Generate in English

batterseapower / pinyin-toolkit

A plugin for the Anki Spaced Repetition System (http://ichi2.net/anki/)

http://batterseapower.github.com/pinyin-toolkit/

39 stars 14 forks source link

MWs Only Generate in English #120

Open Nick3C opened 15 years ago

Nick3C commented 15 years ago

I think I have mentioned this in emails but not yet in a bug tracker.

I guess it is not worth fixing until we add mysql.

batterseapower commented 15 years ago

Yeah, this is one of the growing class of bugs which would be fixed by a proper DB backend. Maybe I'll try and implement that this weekend.

batterseapower commented 15 years ago

I've almost implemented this, but it struck me that we need to be careful.

Imagine we had a few character or character sequence which were simplified to the same one (so it gets two entries in CEDICT with the same pronuncation). Imagine further that one of the two had a measure word. Then when we lookup that character in HanDeDict, how can we tell that it's /really/ legitimate to apply that measure word to that use? After all we might have either of the two possible traditional characters in our hand!

It's a bit obscure but a potential issue.

Nick3C commented 15 years ago

No, you are right, we need to be careful of this. I think that the way to solve it is to match not only the character but also the pinyin. i.e. only fil the MW in german if both the pinyin and character match.

The reason I say this is that if they both match then what we really have is one character with multiple meanings, not 2 characters.

It may lead to failure to fill in German where there are mistakes in either of the dictionaries but it should work most of the time in a 'good enough' way without every filing wrong information.

Nick3C commented 15 years ago

ahhh, it is also possible if trad and simp are different. There could be two entries where the simplified characters are the same and pinyin is the same by many-to-one simplification means the traditional differs.

So we also need to check trad, simp, and pinyin before treating as equivalent.