Open fred777 opened 3 years ago
Please check InfoZip AppNote 6.3.9:
4.4.8 compressed size: (4 bytes)
4.4.9 uncompressed size: (4 bytes)
The size of the file compressed (4.4.8) and uncompressed,
(4.4.9) respectively. When a decryption header is present it
will be placed in front of the file data and the value of the
compressed file size will include the bytes of the decryption
header. If bit 3 of the general purpose bit flag is set,
these fields are set to zero in the local header and the
correct values are put in the data descriptor and
in the central directory.
.....................
Bit 3: If this bit is set, the fields crc-32, compressed
size and uncompressed size are set to zero in the
local header. The correct values are put in the
data descriptor immediately following the compressed
data. (Note: PKZIP version 2.04g for DOS only
recognizes this bit for method 8 compression, newer
versions of PKZIP recognize this bit for any
compression method.)
Can you check this flag with this files? https://github.com/buggins/coolreader/blob/682fb3e34e48bd16f60107708bb8686713fd65e2/crengine/src/lvstream/ziphdr.h#L24 Perhaps you need to add yet another workaround for this case.
Also, please format the code to match the formatting of this file (lvzipdecodestream.cpp). What is this block for? The condition was removed, but the block remained. I would like the files to be formatted in the same style.
Thanks for the fix. It is inconvenient that you cannot provide the problematic file. This means you need to check that there are no regressions :).
Will there be no problems with catalogs/directories with these conditions?
Regarding the formatting - will fix that as you prefer - I just kept the block for cosmetic reasons, i.e. to indicate where the comment is related to...
Regressions - not sure how to deal with it - if you have a test suite somewhere or can give some files that you suspect to be problematic I can check them.
The value of hdr.Flags is 8 for META-INF/content.xml => only bit 4 is set. The full contents of the hdr struct is:
hdr @0x7ffe6d2396c0 ZipLocalFileHdr
Flags 8 lUInt16
Mark 67324752 lUInt32
UnpOS 0 lUInt8
UnpVer 20 lUInt8
others @0x7ffe6d2396c8 lUInt16[11]
[0] 8 lUInt16
[1] 20672 lUInt16
[2] 17665 lUInt16
[3] 0 lUInt16
[4] 0 lUInt16
[5] 0 lUInt16
[6] 0 lUInt16
[7] 266 lUInt16
[8] 0 lUInt16
[9] 22 lUInt16
[10] 0 lUInt16
...and this is the related zipinfo output:
Central directory entry #122:
---------------------------
There are an extra 16 bytes preceding this file.
META-INF/container.xml
offset of local header from start of archive: 3060847
(00000000002EB46Fh) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 2.0
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: not encrypted
extended local header: yes
file last modified on (DOS date/time): 2014 Aug 1 10:06:00
32-bit CRC value (hex): 053eef39
compressed size: 188 bytes
uncompressed size: 266 bytes
length of filename: 22 characters
length of extra field: 0 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: binary
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (00 hex): none
There is no file comment.
Not sure if this helps...
Regressions - not sure how to deal with it - if you have a test suite somewhere or can give some files that you suspect to be problematic I can check them.
Unfortunately, we do not have any test suite, so you can just check the functionality on other zip files. Although your code looks safe, I think it shouldn't be a problem. How often have I thought so, and how often have I been wrong :) But OK, I don't mind. But we'll have to wait for @buggins, I don't have an agreement with him about accepting other people's PR.
The value of hdr.Flags is 8 for META-INF/content.xml => only bit 4 is set. The full contents of the hdr struct is:
You can also uncomment this line: https://github.com/buggins/coolreader/blob/682fb3e34e48bd16f60107708bb8686713fd65e2/crengine/src/lvstream/lvziparc.cpp#L364 and see in console some debug messages including contents of the extra fields. Description of this extra field you can find in InfoZip AppNote.
Looks quite innofensive to me. I haven't looked a the code around, but the only situation that could feel it needs a test is with a file with empty content inside the zip (where I expect unpSize to be 0 while packSize might not be).
There you go:
2021/09/19 19:19:16.2621 TRACE ZIP entry 'META-INF/container.xml' unpSz=266, pSz=188, m=8, offs=3060847, zAttr=0, flg=0, addL=0, commL=0, dn=0
For your convenience I also tried an empty epub file (aka empty zip file) => no crashes / errors, I guess because it will never try to read META-INF/container.xml in this cause ;-) The first thing it tries to extract from epub zip is mimedata in zip root - which is not present in this case here, so the code will exit early.
TRACE ZIP entry 'META-INF/container.xml' unpSz=266, pSz=188
That was with an empty META-INF/container.xml
? So, unpSz is not the size of the file content, but the size of the zip item/entry, which must includes the date and other attributes, so always non-zero, right ?
So, unpSz is not the size of the file content, but the size of the zip item/entry, which must includes the date and other attributes, so always non-zero, right ?
Isn't this information present in localzipheader and extra fields? This is exactly the size of the file. See appnote infozip, section 4.4.9.
No, that was with the original epub file. With empty container.xml the result is
2021/09/19 19:41:33.0281 TRACE ZIP entry 'META-INF/container.xml' unpSz=0, pSz=0, m=0, offs=780464, zAttr=0, flg=81800000, addL=24, commL=0, dn=0
2021/09/19 19:41:33.0281 TRACE ZIP entry extra data: :55:54:05:00:03:77:75:47:61:75:78:0B:00:01:04:E8:03:00:00:04:64:00:00:00
=> notice that the original epub file does not have extra header data - opposed to the modified file I just created with empty container.xml
OK, I don't want to put my head in the specs, so trusting you that your small fix is sane with various combinations of packSize / unpSize when only one is zero, and that it is sane to use only one of srcPackSize / srcUnpSize :)
This fixes zip deflation bug triggered by some watermarked, DRM-free epub file which I bought from a book store. It seems to have been created with Adobe Indesign 9.2.
Cool Reader standalone (as well as KOReader on my Tolino based on crengine, see related issue https://github.com/koreader/koreader/issues/8222) was unable to display it while other readers like the one from Calibre does not have issues. As soon as I extract all contents and re-compress them into a new epub file, Cool Reader did not have issues anymore, so the culprit lies in the binary zip structure of the epub file.
It turned out that packSize was detected as zero within LVZipDecodeStream::Create while unpSize was detected correctly, so the existing workaround did not kick in => I have separated it into two distinct if clauses.
Unfortunately I can't give you the epub due to the watermarking and copyright :-/