Closed jpambrun closed 5 months ago
Yes, it would make absolute sense. This was an oversight when the Open
methods were implemented; I really appreciate you catching this @jpambrun !
Welp.. This bug generated a 500$ AWS will in ~45min. I was testing from home at ~1gbps thinking it would cost low 10s of dollars. but I was charged for the bytes I didn't received. ~300GB were transferred according to my ISP meter, but AWS counted 6000GB.
It means if you hate someone hosting something on AWS you can multiply your damage by just not consuming you stream completely. With 1gbps you can cause 20gbps of costs.
Hopefully, this has been fully addressed. Please raise an issue (or submit a PR) if that is not the case Thanks again @jpambrun
I want to stream concurrently from a single zip over http or s3. This library seems to work fine, but upon closer inspection I notice higher than expected network usage because for each entry the file is streamed much further than needed.
For example, being a bit silly, I zipped my node_modules/ and read it from http with unzipper. The zip is 2 MB and contains ~1000 files. With the current implementation and the localhost latency I end up transferring 1+GB.
This is because the byte range resquest in unbounded on line 14:
https://github.com/ZJONSSON/node-unzipper/blob/6a165e569c46a2625d10ecc8f58081635ad88da1/lib/Open/unzip.js#L10-L15
This may be less problematic if throughtput is less? But it would always overshoot.
I am not a zip expret, but I think it should be possible to set a upper range with the central directory information? Something like this:
30 is supposed to be the per file header base size, to which we need to add the filename lengent and extra field lenght. I don't know why I need the extra 6.
Would it make sense to add this upper bound, maybe with some padding to be safe when the data is available?