archiverjs / node-zip-stream

a streaming zip archive generator
http://archiverjs.com/zip-stream
MIT License
154 stars 22 forks source link

issue with mac osx 10.6 archive utility and store method #8

Closed ctalkington closed 10 years ago

ctalkington commented 10 years ago

as reported at ctalkington/node-archiver#74. i cant personally verify this since no access.

@gconaty moving issue here as its the underlying lib that would need any adjustments.

ctalkington commented 10 years ago

the only thing i can think might be different is that maybe the utility reads the file different when store is defined. it may be taking the following spec literally and not using the sizes set in data descriptors if in store mode. this is done this way as streams have no defined size until finished.

bit 3: If this bit is set, the fields crc-32, compressed size
     and uncompressed size are set to zero in the local
     header.  The correct values are put in the data descriptor
     immediately following the compressed data.

could you try store: true with a buffer instead of stream? if that works, then most likely when store is used with stream, we would have to collect the stream into buffer and define its sizes upfront.

ctalkington commented 10 years ago

@gconaty i see varied reports that it could def be issues with the data descriptors. do you see any "ditto" errors in console (when using archive utility) like the below article mentions? ive been looking around it seems the data descriptor signature may not be required so maybe throwing everything off for extractors that don't know how to handle it.

http://ocportal.com/site/news/view/chris_grahams_blog/problem_between_the_mac_2.htm

I might just call this an extractor issue as we do follow spec. the below comment might be of interest.

https://github.com/evanmiller/mod_zip/issues/6#issuecomment-25418143

gconaty commented 10 years ago

No dice with this one either. Attaching zip from buffer.

On May 3, 2014, at 9:37 PM, Chris Talkington notifications@github.com wrote:

the only thing i can think might be different is that maybe the utility reads the file different when store is defined. it may be taking the following spec literally and not using the sizes set in data descriptors if in store mode. this is done this way as streams have no defined size until finished.

bit 3: If this bit is set, the fields crc-32, compressed size and uncompressed size are set to zero in the local header. The correct values are put in the data descriptor immediately following the compressed data. could you try store: true with a buffer instead of stream? if that works, then most likely when store is used with stream, we would have to collect the stream into buffer and define its sizes upfront.

— Reply to this email directly or view it on GitHub.

ctalkington commented 10 years ago

can you try:

commenting out this line https://github.com/ctalkington/node-zip-stream/blob/master/lib/headers.js#L205

in node_modules/archiver/node_modules/zip-stream/lib/headers.js (if using with archiver)

can you also confirm archiver version? and do a npm update in project to make you have recent release of zip-stream 0.3.2.

being that you reported that it works without store, that makes me wonder if it does indeed treat the data descriptor different if compression = 0

gconaty commented 10 years ago

zip-stream 0.3.2 Here's the debug (I commented out signature)

CT EDIT: trim out debug text that didnt help in this case

ctalkington commented 10 years ago

guessing same result?

also of interest:

http://en.wikipedia.org/wiki/Archive_Utility

password-protected ZIP archives are not supported. It can not handle streamed ZIP content as well as written by ZIP version > 2.0

not sure if thats still valid for latest versions but def interesting.

gconaty commented 10 years ago

Found this in the console (I was looking in the system console sigh)

ditto: file1.txt: No such file or directory ditto: Couldn't read pkzip signature.

On May 3, 2014, at 10:46 PM, Chris Talkington notifications@github.com wrote:

guessing same result?

also of interest:

http://en.wikipedia.org/wiki/Archive_Utility

password-protected ZIP archives are not supported. It can not handle streamed ZIP content as well as written by ZIP version > 2.0 not sure if thats still valid for latest versions but def interesting.

— Reply to this email directly or view it on GitHub.

ctalkington commented 10 years ago

yah seems it just can't parse properly when data descriptor and compression = 0.

EDIT: the only way around it that i can see would be collecting all the stream data and then adding it as buffer without using bit 3. that requires quite a bit of refactoring and is slower in general than streaming plus is limited by RAM since itll store all file buffers in memory.

gconaty commented 10 years ago

Trying to get my head around this (without going into the zip spec quite yet!). Also saw this:

https://github.com/onkis/node-native-zip/commit/a7303f3e2878df8a10a13e467839f2356da2315d

Are you saying that the only zip creators that work with archive utility can't use streaming and have to buffer everything first (and that's the only solution?) If so it kind of defeats the purpose :(

The only confusing part then left to me (if above is true) is what's the difference in the files when compression is used (and streamed)?

On May 3, 2014, at 10:52 PM, Chris Talkington notifications@github.com wrote:

the only way around it that i can see would be collecting all the stream data and then adding it as buffer without using bit 3. that requires quite a bit of refactoring and is slower in general than streaming plus is limited by RAM since itll store all file buffers in memory.

— Reply to this email directly or view it on GitHub.

ctalkington commented 10 years ago

my guess with archive utility is that its code, knows how to handle bit 3 when compression > 0 but not when compression = 0. the spec does say "immediately following the compressed data" so maybe they took that literally. newer versions of zip spec (4.5 in this case) go into more detail and explain what i kinda think is happening here..

Bit 3: If this bit is set, the fields crc-32, compressed 
                 size and uncompressed size are set to zero in the 
                 local header.  The correct values are put in the 
                 data descriptor immediately following the compressed
                 data.  (Note: PKZIP version 2.04g for DOS only 
                 recognizes this bit for method 8 compression, newer 
                 versions of PKZIP recognize this bit for any 
                 compression method.)

if i saw the code, id say their parsing code prob uses something like:

if (bit3 and compression equals 8) {
 // expect data descriptor after content
}
ctalkington commented 10 years ago

is there a specific reason for using store? im going to roll out a fix that only uses the data descriptor when its needed ie streaming (should reduce overall file sizes slightly too).

EDIT: ok, ive published zip-stream v0.3.3 which should work for buffer + store combo. it may take a few to hit npm servers but do try it out when you get a chance and let me know if it resolves the issue.

ctalkington commented 10 years ago

Are you saying that the only zip creators that work with archive utility can't use streaming and have to buffer everything first (and that's the only solution?) If so it kind of defeats the purpose :(

These formats (zip and tar) are so old that initial use cases were from disk (static data) to archive (sometimes even spanning multiple files), not streams that can come from disk, memory or other network sources (dynamic data). so they would use things like disk read-ahead and filesystem data to get size of static data. there have been so many variations of zip spec (i use 2.0 from 1993 as basis for most) but its all in how its implemented in parser, you will find little variations between them all based on how their dev understands the spec. a lot of packers still handle streams by reading all the data into buffer first.(archiver does this for tar as it doesnt have a bit system like zip)

gconaty commented 10 years ago

Nice work thank you.

This does the trick. Thank you for isolating the change I'd say it's resolved.

And (for posterity) I'd put in an app note

On May 3, 2014, at 11:35 PM, Chris Talkington notifications@github.com wrote:

ok, ive published zip-stream v0.3.3 which should work for buffer + store combo. it may take a few to hit npm servers but do try it out when you get a chance and let me know if it resolves the issue.

— Reply to this email directly or view it on GitHub.

ctalkington commented 10 years ago

no problem. its always the mac os issues that are hardest to track down since I don't have any systems using it.

that note makes sense. was there a specific reason for store or just an edge case you noticed?

gconaty commented 10 years ago

Yes not to mention they aren't open-source...

I found the issue as the streaming downloader I created zips mp3s retrieved from S3 and the client requested not to compress as the audio was already compressed. Of course I flipped the switch and then users with older computers started reporting errors :( On May 4, 2014, at 12:07 AM, Chris Talkington notifications@github.com wrote:

no problem. its always the mac os issues that are hardest to track down since I don't have any systems using it.

that note makes sense. was there a specific reason for store or just an edge case you noticed?

— Reply to this email directly or view it on GitHub.

ctalkington commented 10 years ago

ah always fun when client demands bring up edge cases. been there done that.

i may look into a way to read ahead on streams with some kinda upper-limit on how much data to read. stream to stream is just much more efficient since it does its own backflow and doesn't consume near as much RAM or middle man buffers. it does seem older mac and android (java based) both are pickier than most parsers.