google-code-export / fanficdownloader

Automatically exported from code.google.com/p/fanficdownloader
0 stars 0 forks source link

EPUB output is invalid: 'mimetype' is not the first file, and is compressed. #6

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Generate EPUB output, e.g., "python downaloder.py 
http://www.fanfiction.net/s/5782108/1/ epub".
2. Test it with epubcheck or http://threepress.org/document/epub-validate/. The 
first error will be "ERROR: Harry_Potter_and_the_Methods_of_Rationality.epub: 
length of first filename in archive must be 8, but was 22".

What is the expected output? What do you see instead?
The EPUB output is invalid. While it may work on some devices, it may fail on 
others.

What version of the product are you using? On what operating system?
I'm using current tip (26:54fc9b30ced5) on Python 2.6.4 (Ubuntu 9.10).

Please provide any additional information below.
There are several different issues causing validation to fail. This is the 
first one. See section 4 of the OCF standard (part of what makes up EPUB): 
http://www.idpf.org/ocf/ocf1.0/download/ocf10.htm

In short, the file 'mimetype' must appear first in the zipfile, it must be 
uncompressed and have no extended attributes, and it must contain precisely the 
text 'application/epub+zip'.

The attached patch fixes this--the list of files sent to the ZIP engine is now 
ordered, with 'mimetype' at the top--and the version of Python I have doesn't 
support setting per-item compression, so all files are now uncompressed. I'm 
not a Python coder, and this makes filesize significantly worse, since the 
content files are uncompressed. You may want to revisit my methods, to say the 
least.

Original issue reported on code.google.com by adam.buc...@gmail.com on 15 Sep 2010 at 8:23

Attachments:

GoogleCodeExporter commented 9 years ago
Pardon me; that patch apparently didn't work. I'm not sure what I mixed up in 
there, but I'll see if I can get it fixed and reupload when I get a moment.

Original comment by adam.buc...@gmail.com on 15 Sep 2010 at 8:33

GoogleCodeExporter commented 9 years ago
I forgot to test the filename against the string 'mimetype' rather than the 
(filename, data) tuple. Oops. When this patch worked previously, it was by 
sheer coincidence.

Again, note that there's a performance hit (the ebooks are larger) because all 
files are stored uncompressed; there's a tradeoff between standard compliance 
and compression using this version of Python's ZIP support, unless someone 
knows a way around that.

Original comment by adam.buc...@gmail.com on 16 Sep 2010 at 2:54

Attachments:

GoogleCodeExporter commented 9 years ago
Adam, I apologize that I didn't realize you'd put patches here for a lot of 
issues until after I'd already coded my own fixes.

In this particular case, my solution is a bit different and allows the chapter 
files to be compressed while mimetype isn't.

Original comment by retiefj...@gmail.com on 16 Oct 2010 at 1:58

GoogleCodeExporter commented 9 years ago
That's a better fix than I had, in any case; I read that Python's zip module 
didn't support per-file compression options until Python 2.7, but it never 
occurred to me to close and reopen the zipfile using different compression 
options each time.

Original comment by adam.buc...@gmail.com on 16 Oct 2010 at 3:57