Consider bounding the stream request when streaming entries

jpambrun commented 5 months ago

I want to stream concurrently from a single zip over http or s3. This library seems to work fine, but upon closer inspection I notice higher than expected network usage because for each entry the file is streamed much further than needed.

For example, being a bit silly, I zipped my node_modules/ and read it from http with unzipper. The zip is 2 MB and contains ~1000 files. With the current implementation and the localhost latency I end up transferring 1+GB.

This is because the byte range resquest in unbounded on line 14:

https://github.com/ZJONSSON/node-unzipper/blob/6a165e569c46a2625d10ecc8f58081635ad88da1/lib/Open/unzip.js#L10-L15

This may be less problematic if throughtput is less? But it would always overshoot.

I am not a zip expret, but I think it should be possible to set a upper range with the central directory information? Something like this:

  var req = source.stream(offset, directoryVars.compressedSize + directoryVars.extraFieldLength + 30 + directoryVars.fileNameLength+6);

30 is supposed to be the per file header base size, to which we need to add the filename lengent and extra field lenght. I don't know why I need the extra 6.

Would it make sense to add this upper bound, maybe with some padding to be safe when the data is available?

ZJONSSON commented 5 months ago

Yes, it would make absolute sense. This was an oversight when the Open methods were implemented; I really appreciate you catching this @jpambrun !

jpambrun commented 5 months ago

Welp.. This bug generated a 500$ AWS will in ~45min. I was testing from home at ~1gbps thinking it would cost low 10s of dollars. but I was charged for the bytes I didn't received. ~300GB were transferred according to my ISP meter, but AWS counted 6000GB.

It means if you hate someone hosting something on AWS you can multiply your damage by just not consuming you stream completely. With 1gbps you can cause 20gbps of costs.

ZJONSSON commented 5 months ago

Hopefully, this has been fully addressed. Please raise an issue (or submit a PR) if that is not the case Thanks again @jpambrun

ZJONSSON / node-unzipper

Consider bounding the stream request when streaming entries #308