Closed wareya closed 6 years ago
It seems that different language's definitions are now placed in their own senses.
Before:
<entry>
<ent_seq>1797990</ent_seq>
<k_ele>
<keb>常人</keb>
</k_ele>
<r_ele>
<reb>じょうじん</reb>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>ordinary person</gloss>
<gloss>run-of-the-mill people</gloss>
<gloss>John Doe</gloss>
<gloss>Jane Doe</gloss>
<gloss xml:lang="ger">normaler (m) Mensch</gloss>
<gloss xml:lang="ger">(m) Durchschnittsmensch</gloss>
<gloss xml:lang="ger">(m) Mann von der Straße</gloss>
<gloss xml:lang="rus">заурядный человек, простой смертный; простой человек</gloss>
<gloss xml:lang="spa">persona normal y corriente</gloss>
</sense>
</entry>
After:
<entry>
<ent_seq>1797990</ent_seq>
<k_ele>
<keb>常人</keb>
</k_ele>
<r_ele>
<reb>じょうじん</reb>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>ordinary person</gloss>
<gloss>run-of-the-mill people</gloss>
<gloss>John Doe</gloss>
<gloss>Jane Doe</gloss>
</sense>
<sense>
<gloss xml:lang="ger">(n) normaler (m) Mensch</gloss>
<gloss xml:lang="ger">(m) Durchschnittsmensch</gloss>
<gloss xml:lang="ger">(m) Mann von der Straße</gloss>
</sense>
<sense>
<gloss xml:lang="rus">заурядный человек, простой смертный; простой человек</gloss>
</sense>
<sense>
<gloss xml:lang="spa">persona normal y corriente</gloss>
</sense>
</entry>
Wow, EDICT still gets updated? That's nice. Shame they change the format like this, though when I was writing the implementation for this I was wondering why all the other languages were always crammed into the first English sense, so that's probably for the better.
The bug shouldn't be too difficult to fix, the parser simply assumed all senses had (english) text to them until now.
That took longer than it should've. Github doesn't like me committing 100MB files so I couldn't update JMDict in the repository, but the code should now work with the new dictionary file if updated manually.
In the future, it might be a good idea to download it as part of the compilation process instead of having it in the repo. I think most public projects that use large files do something like that, if they don't want to use proprietary VCSs or weird git extensions.