LaurensWeyn / Spark-Reader

A tool to assist non-naitive speakers in reading Japanese
GNU General Public License v3.0
30 stars 7 forks source link

JMDict update broke something #25

Closed wareya closed 6 years ago

wareya commented 6 years ago

image

wareya commented 6 years ago

It seems that different language's definitions are now placed in their own senses.

Before:

<entry>
<ent_seq>1797990</ent_seq>
<k_ele>
<keb>常人</keb>
</k_ele>
<r_ele>
<reb>じょうじん</reb>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>ordinary person</gloss>
<gloss>run-of-the-mill people</gloss>
<gloss>John Doe</gloss>
<gloss>Jane Doe</gloss>
<gloss xml:lang="ger">normaler (m) Mensch</gloss>
<gloss xml:lang="ger">(m) Durchschnittsmensch</gloss>
<gloss xml:lang="ger">(m) Mann von der Straße</gloss>
<gloss xml:lang="rus">заурядный человек, простой смертный; простой человек</gloss>
<gloss xml:lang="spa">persona normal y corriente</gloss>
</sense>
</entry>

After:

<entry>
<ent_seq>1797990</ent_seq>
<k_ele>
<keb>常人</keb>
</k_ele>
<r_ele>
<reb>じょうじん</reb>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>ordinary person</gloss>
<gloss>run-of-the-mill people</gloss>
<gloss>John Doe</gloss>
<gloss>Jane Doe</gloss>
</sense>
<sense>
<gloss xml:lang="ger">(n) normaler (m) Mensch</gloss>
<gloss xml:lang="ger">(m) Durchschnittsmensch</gloss>
<gloss xml:lang="ger">(m) Mann von der Straße</gloss>
</sense>
<sense>
<gloss xml:lang="rus">заурядный человек, простой смертный; простой человек</gloss>
</sense>
<sense>
<gloss xml:lang="spa">persona normal y corriente</gloss>
</sense>
</entry>
LaurensWeyn commented 6 years ago

Wow, EDICT still gets updated? That's nice. Shame they change the format like this, though when I was writing the implementation for this I was wondering why all the other languages were always crammed into the first English sense, so that's probably for the better.

The bug shouldn't be too difficult to fix, the parser simply assumed all senses had (english) text to them until now.

LaurensWeyn commented 6 years ago

That took longer than it should've. Github doesn't like me committing 100MB files so I couldn't update JMDict in the repository, but the code should now work with the new dictionary file if updated manually.

wareya commented 6 years ago

In the future, it might be a good idea to download it as part of the compilation process instead of having it in the repo. I think most public projects that use large files do something like that, if they don't want to use proprietary VCSs or weird git extensions.