Closed sinergatis closed 8 years ago
Sorry for forgetting about the tests. :) The recent changes were sometimes fairly numerous, sometimes a bit of effort, and sometimes quite a bit of effort beyond my ability, so I was caught up in being lost and how to figure something out. I'll be sure to run the tests regularly. Thanks for fixing them up!
No worries, I fully understand - the "test suite" is rather light at the moment, but it indeed would be great if we work on expanding it eventually and agree on running the tests and having them pass as the last step when working on a feature!
Added a flag to the addepub
command (making the default to do perform that check against Book.original_filename), and added a new test plus adjusting the existing ones, as hinted on https://github.com/aristippe/pathagar/pull/38#discussion-diff-54635280.
Thank you for the latest flag addition! Everything looks good, and feel free to merge.
Ok, made the tag parsing slightly more robust (try to add tag directly first, try to add tag forced to utf-8 later, and if both fails do not add the tag and continue). It's still not bulletproof, but hopefully will at least provide some info about tag problems. I also slightly tweaked the whole if
, basically making more room for long lines.
Hopefully this will close this pull request once you are able to test it with your library (or a subset of it)! Please feel to merge if it looks fine, or otherwise suggest further tweaks!
I'm ok either way with merging as is and dealing with the few exceptions later or trying to figure out how to improve tag parsing if it's only a minor thing. I haven't yet looked at how to set a different remote and pull one of your branches to see if I can add to it.
Testing out a set of about 3500 ePubs, a few dozen fail. Most of them are French. The one I looked at was an OPF UTF-8 encoded though the tags have some accents:
'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128) dc:subjectVerlaine, poèmes licencieux, érotisme, désir, éloge des corps,/dc:subject
Another complains about this tag. Is it the mdash?
fl. 1834-1864—Juvenile literature
Ok! I'll merge then, hoping that the latest commit takes care of the ascii ones that fail. It seems indeed that the cases you mention are due to accented letters in the first case, and the long dash - but maybe it's something a bit more tricky, like the manifest including them in the wrong encoding, and it getting mangled on the epub->dict->addepub->output process.
Again, feel free to test the changes against your library whenever you have the chance, and don't hesitate on making changes to addepub directly to fix the issues or ping me so I can take another look at it - encoding issues are always quite messy!
Also, for quickly checking my branches something like this should work:
$ git remote add some_name_like_sinergatis https://github.com/sinergatis/pathagar.git
$ git fetch some_name_like_sinergatis --all
$ git checkout some_name_like_sinergatis/some_branch
Alternatively something like this should work, although it's a bit more cumbersome.
This pull request is based on #37 - please merge #37 first once it looks good, as the only relevant commit compared to that pull request is 56bfa10536c44128ec30bb3342ddef5fd25a70d6.
The main reason is that the tests were failing/erroring, which should ideally not happen. In the future, it would be great if you could make sure that the tests run (and pass!) whenever significant changes are made to the rest of the code (ideally before merging a commit) by simply running
python manage.py test
, as most of the times fixing them is rather trivial while being on the same mindset (as opposed to coming back to them later and track back the changes), plus providing some safety net and help catching potential bugs introduced!I'll comment on some specific lines, as I was just focused on getting the tests usable and the solutions might need some further work.