blastrock / kakugo

Learn Japanese with Kakugo
GNU General Public License v3.0
157 stars 18 forks source link

German translation not correctly imported #84

Open st142148 opened 1 year ago

st142148 commented 1 year ago

Hi, I'm quite happy the german translation was added, thank you for that. But there are a lot of errors. None of them bad translations, but rather just the wrong words being translated, i.e. 三つ = mawashi (the sumo garment). Also single kanji words like 品 (しな, ひん) getting mixed up.

Before I go about checking all 2200 kanji + ~9000 words, is there a way to correct this at import? Also am I correct in thinking the dictionary is just a sqlite database gzipped?

blastrock commented 1 year ago

They do seem to be mixed up indeed.

The sqlite database is a compilation I made from various sources (see About in the app). The German word translations come from JMdict, but since words usually have multiple definitions, I take the first one, hoping that it's the most common.

In the case of 三つ, it is not:

<gloss xml:lang="ger">Mawashi (Lendentuch der Sumōringer)</gloss>
<gloss xml:lang="ger">drei</gloss>
<gloss xml:lang="ger">drei</gloss>
<gloss xml:lang="ger">drei Stück</gloss>
<gloss xml:lang="ger">drei Jahre alt</gloss>

Unfortunately, I have no way of ordering these entries automatically. I don't really know what to do for this.

st142148 commented 1 year ago

How/what are you querying? Using WWWJDIC seems to get the first result right, though I don't know how they are ranking the results. Also should it not be possible to only get exact matches? (And then look at the reading)

blastrock commented 1 year ago

The snippet from my previous post comes from the JMdict XML file.

You link doesn't work for me, I think you can't link WWWJDIC. I tried searching it myself, and i noticed it says that the German dictionary comes from WaDoku, so I searched there too and Mawashi is the first definition, even though WWWJDIC gets it right.

I don't really know how WWWJDIC uses WaDoku, and how JMdict integrates it. I believe the issue should be fixed with JMdict instead.