Closed ctalkington closed 10 years ago
the only thing i can think might be different is that maybe the utility reads the file different when store is defined. it may be taking the following spec literally and not using the sizes set in data descriptors if in store mode. this is done this way as streams have no defined size until finished.
bit 3: If this bit is set, the fields crc-32, compressed size
and uncompressed size are set to zero in the local
header. The correct values are put in the data descriptor
immediately following the compressed data.
could you try store: true
with a buffer instead of stream? if that works, then most likely when store is used with stream, we would have to collect the stream into buffer and define its sizes upfront.
@gconaty i see varied reports that it could def be issues with the data descriptors. do you see any "ditto" errors in console (when using archive utility) like the below article mentions? ive been looking around it seems the data descriptor signature may not be required so maybe throwing everything off for extractors that don't know how to handle it.
http://ocportal.com/site/news/view/chris_grahams_blog/problem_between_the_mac_2.htm
I might just call this an extractor issue as we do follow spec. the below comment might be of interest.
https://github.com/evanmiller/mod_zip/issues/6#issuecomment-25418143
No dice with this one either. Attaching zip from buffer.
On May 3, 2014, at 9:37 PM, Chris Talkington notifications@github.com wrote:
the only thing i can think might be different is that maybe the utility reads the file different when store is defined. it may be taking the following spec literally and not using the sizes set in data descriptors if in store mode. this is done this way as streams have no defined size until finished.
bit 3: If this bit is set, the fields crc-32, compressed size and uncompressed size are set to zero in the local header. The correct values are put in the data descriptor immediately following the compressed data. could you try store: true with a buffer instead of stream? if that works, then most likely when store is used with stream, we would have to collect the stream into buffer and define its sizes upfront.
— Reply to this email directly or view it on GitHub.
can you try:
commenting out this line https://github.com/ctalkington/node-zip-stream/blob/master/lib/headers.js#L205
in node_modules/archiver/node_modules/zip-stream/lib/headers.js (if using with archiver)
can you also confirm archiver version? and do a npm update in project to make you have recent release of zip-stream 0.3.2.
being that you reported that it works without store, that makes me wonder if it does indeed treat the data descriptor different if compression = 0
zip-stream 0.3.2 Here's the debug (I commented out signature)
CT EDIT: trim out debug text that didnt help in this case
guessing same result?
also of interest:
http://en.wikipedia.org/wiki/Archive_Utility
password-protected ZIP archives are not supported. It can not handle streamed ZIP content as well as written by ZIP version > 2.0
not sure if thats still valid for latest versions but def interesting.
Found this in the console (I was looking in the system console sigh)
ditto: file1.txt: No such file or directory ditto: Couldn't read pkzip signature.
On May 3, 2014, at 10:46 PM, Chris Talkington notifications@github.com wrote:
guessing same result?
also of interest:
http://en.wikipedia.org/wiki/Archive_Utility
password-protected ZIP archives are not supported. It can not handle streamed ZIP content as well as written by ZIP version > 2.0 not sure if thats still valid for latest versions but def interesting.
— Reply to this email directly or view it on GitHub.
yah seems it just can't parse properly when data descriptor and compression = 0.
EDIT: the only way around it that i can see would be collecting all the stream data and then adding it as buffer without using bit 3. that requires quite a bit of refactoring and is slower in general than streaming plus is limited by RAM since itll store all file buffers in memory.
Trying to get my head around this (without going into the zip spec quite yet!). Also saw this:
https://github.com/onkis/node-native-zip/commit/a7303f3e2878df8a10a13e467839f2356da2315d
Are you saying that the only zip creators that work with archive utility can't use streaming and have to buffer everything first (and that's the only solution?) If so it kind of defeats the purpose :(
The only confusing part then left to me (if above is true) is what's the difference in the files when compression is used (and streamed)?
On May 3, 2014, at 10:52 PM, Chris Talkington notifications@github.com wrote:
the only way around it that i can see would be collecting all the stream data and then adding it as buffer without using bit 3. that requires quite a bit of refactoring and is slower in general than streaming plus is limited by RAM since itll store all file buffers in memory.
— Reply to this email directly or view it on GitHub.
my guess with archive utility is that its code, knows how to handle bit 3 when compression > 0 but not when compression = 0. the spec does say "immediately following the compressed data" so maybe they took that literally. newer versions of zip spec (4.5 in this case) go into more detail and explain what i kinda think is happening here..
Bit 3: If this bit is set, the fields crc-32, compressed
size and uncompressed size are set to zero in the
local header. The correct values are put in the
data descriptor immediately following the compressed
data. (Note: PKZIP version 2.04g for DOS only
recognizes this bit for method 8 compression, newer
versions of PKZIP recognize this bit for any
compression method.)
if i saw the code, id say their parsing code prob uses something like:
if (bit3 and compression equals 8) {
// expect data descriptor after content
}
is there a specific reason for using store
? im going to roll out a fix that only uses the data descriptor when its needed ie streaming (should reduce overall file sizes slightly too).
EDIT: ok, ive published zip-stream v0.3.3 which should work for buffer + store combo. it may take a few to hit npm servers but do try it out when you get a chance and let me know if it resolves the issue.
Are you saying that the only zip creators that work with archive utility can't use streaming and have to buffer everything first (and that's the only solution?) If so it kind of defeats the purpose :(
These formats (zip and tar) are so old that initial use cases were from disk (static data) to archive (sometimes even spanning multiple files), not streams that can come from disk, memory or other network sources (dynamic data). so they would use things like disk read-ahead and filesystem data to get size of static data. there have been so many variations of zip spec (i use 2.0 from 1993 as basis for most) but its all in how its implemented in parser, you will find little variations between them all based on how their dev understands the spec. a lot of packers still handle streams by reading all the data into buffer first.(archiver does this for tar as it doesnt have a bit system like zip)
Nice work thank you.
This does the trick. Thank you for isolating the change I'd say it's resolved.
And (for posterity) I'd put in an app note
On May 3, 2014, at 11:35 PM, Chris Talkington notifications@github.com wrote:
ok, ive published zip-stream v0.3.3 which should work for buffer + store combo. it may take a few to hit npm servers but do try it out when you get a chance and let me know if it resolves the issue.
— Reply to this email directly or view it on GitHub.
no problem. its always the mac os issues that are hardest to track down since I don't have any systems using it.
that note makes sense. was there a specific reason for store
or just an edge case you noticed?
Yes not to mention they aren't open-source...
I found the issue as the streaming downloader I created zips mp3s retrieved from S3 and the client requested not to compress as the audio was already compressed. Of course I flipped the switch and then users with older computers started reporting errors :( On May 4, 2014, at 12:07 AM, Chris Talkington notifications@github.com wrote:
no problem. its always the mac os issues that are hardest to track down since I don't have any systems using it.
that note makes sense. was there a specific reason for store or just an edge case you noticed?
— Reply to this email directly or view it on GitHub.
ah always fun when client demands bring up edge cases. been there done that.
i may look into a way to read ahead on streams with some kinda upper-limit on how much data to read. stream to stream is just much more efficient since it does its own backflow and doesn't consume near as much RAM or middle man buffers. it does seem older mac and android (java based) both are pickier than most parsers.
as reported at ctalkington/node-archiver#74. i cant personally verify this since no access.
@gconaty moving issue here as its the underlying lib that would need any adjustments.