Open rewrib opened 2 years ago
Hi, sorry for the late reply.
I just ran the script on the most recent dump and noticed the same.
This appears to be due to the entry for 第一次世界大戦
, for which one of the contained accent patterns does not check out (the word is split into three parts but only for the first two a pattern is given). In such a case the script simply ignores the erroneous entry and moves on.
The file being bigger is most likely just due to new entries on Wadoku.
I'm using your script to get the newest possible csv from the newest Wadoku XML dump (04. Jul 2021), but I get the error(?) "Akz token annotation does not check out." Did I do something wrong? The file size of the csv I generate is twice as big as the one you get through the anki plugin. (about 18 MB)
I am using Python 3.6.7 on Ubuntu.