error when trying to get a csv from the newest Wadoku XML dump

IllDepence / anki_add_pitch

Script to automatically add pitch accent information to an Anki deck.

MIT License

4 stars 1 forks source link

error when trying to get a csv from the newest Wadoku XML dump #1

Open rewrib opened 2 years ago

rewrib commented 2 years ago

I'm using your script to get the newest possible csv from the newest Wadoku XML dump (04. Jul 2021), but I get the error(?) "Akz token annotation does not check out." Did I do something wrong? The file size of the csv I generate is twice as big as the one you get through the anki plugin. (about 18 MB)

I am using Python 3.6.7 on Ubuntu.

IllDepence commented 2 years ago

Hi, sorry for the late reply.

I just ran the script on the most recent dump and noticed the same. This appears to be due to the entry for 第一次世界大戦, for which one of the contained accent patterns does not check out (the word is split into three parts but only for the first two a pattern is given). In such a case the script simply ignores the erroneous entry and moves on. The file being bigger is most likely just due to new entries on Wadoku.