Using large file jszip dies silently

arjan1234 commented 11 years ago

Using a large ZIP file (7mb compressed, 430mb uncompressed) the library dies silently:

http://www.cultureelerfgoed.nl/geo/RCE_MIP_objecten_punten_GML.zip

(browser: FF 20.0, desktop:Ubuntu 3.2.0-41-generic, x86_64)

dduponchel commented 11 years ago

Good news, I can reproduce it ! Bad news, I don't know how to fix it (yet).

I think this comes from the insane amount of memory used to read the file. I'm already working on the CPU consumption, the memory consumption will be the next one.

If you are only using only one file at a time, my work could help (lazy decompress files).

arjan1234 commented 11 years ago

Ok. I am using only one file at a time. Where can I find your work (lazy decompress)...?

dduponchel commented 11 years ago

You can find it on my branch WIP_perfs here : https://github.com/dduponchel/jszip/tree/WIP_perfs (I just pushed my latest commits). The API is almost the same, the biggest change is the removal of the data attribute. On my index.html, you should check the "Migrating Guide" and the "Performance issues" parts. As soon as I'm completely satisfied with the code, I'll do a pull request.

martingraham commented 11 years ago

I've put some changes into a branch I've got that "may" address some issues, it uses an arraybuffer if it can to load in the zip rather than a string (saves 50% memory), and after that tries to avoid making substrings or concatenating strings as much as it can (its happy operating over a uint8array view of the original arraybuffer). It also allows you to use a function to grab parts of files as they are inflated (i.e. so you can filter csv files by row/column, maybe write something to grab parts of zipped xml or json files, it'll take you to write those functions though).

https://github.com/martingraham/jszip/

For the OP, trying to fully decompress a 430MB file will take 860MB of string space (2 bytes a char) in a browser and completely bork all browsers, which is why I've been investigating partial decompressions. I'm trying to grab parts of a 75MB zip (290MB uncompressed) as I know decompressing the whole thing isn't going to work.

The infuriating thing is it seems to completely depend on the browser. Chrome is fast as hell, while Opera doesn't work and I can't tell where it fails. IE9 gets there after a fashion after using an insane amount of memory (mainly due to not doing arraybuffers properly when loading with an xhr), Safari works, Firefox works 90% of the time but for some reason will just go at a snails pace the odd time. They all use completely different amounts of memory too.

Trying to profile any of them when grabbing large chunks of big files crashes every one of them, but profiling when I'm taking smaller chunks shows chrome at least is using less memory as I intended, but it looks like opera at least isn't.

I integrated my updates into the latest versions of the .js files, even though i forked off back in february, and it passes the new test cases

dduponchel commented 11 years ago

I'm glad to see I'm not the only one struggling with optimization and big files :) Your work seems promising, I'll definitely look into it !

martingraham commented 11 years ago

PS Opera v 15 now works a lot better re memory as its moved to a webkit engine.

kwroblewski commented 11 years ago

I have been developing a very data processing intensive enterprise level application, so this interests me as well. I use jszip as a way of overcoming the 5MB limit on localstorage in HTML5 by dumping my JSON data to a textfile and zipping it with js zip, then saving the base64 of the zip file to local storage instead. Works great!

However, some of this data has several hundred thousand records, so the text file I create gets rather big.

Anyway, given my experience so far, what about the idea of making jszip apply compression asynchronously? I.e. iterate through the data in chunks, using setIntervals() between chunks, then calling a callback function the user passes in for when it's done? Larger files may still take time, but it would never crash the browse and could even provide percent complete feedback as well.

dduponchel commented 11 years ago

@kwroblewski That's an interesting idea but might require too much work : JSZip has a synchronous API and the library handling the DEFLATE algorithm doesn't support that.

To create asynchronous methods when compressing/decompressing data we would need to duplicate or rework the as{Text,Binary,ArrayBuffer,Uint8Array,Buffer}, load and generate mehods. Being asynchronous has clear benefits (not only for the (de)compression) but I don't know if there is a clean way to do it while keeping a synchronous API.

Second point, the compression is handled by a third party library which doesn't support synchronous AND asynchronous (de)compression. Moreover, our library choice is limited by the IE support.

@Stuk do you have an other point of view on the subject ?

PS : sorry for the double notification, I miss-clicked while redacting this message...

rejas commented 10 years ago

does https://github.com/Stuk/jszip/commit/83aa6b6324522fe5c1f28559d26695c57feb86e7 somehow relate to this too (or might fix it even)?

dduponchel commented 10 years ago

Yes, this commit will help (just released as v2.2.1). It won't completely fix this bug but I'm working on it, see #121.

drhewitt commented 9 years ago

I am using JSZIp with a Salesforce app but when I load xlsx files larger than 500kb, I get an import error message. How do I overcome this?

dduponchel commented 9 years ago

What is your code and what browser do you use ?

drhewitt commented 9 years ago

Hi David,

I'm using Salesforce and I'm using the app found here: https://github.com/financialforcedev/apex-zip

I deployed the app to Salesforce and have used Firefix, Crome and IE but there seems to be a limit on how big the Excel spreadsheet can be (this is an xlsx spreadsheet)

Not much more than 400kb, if that.

Is this because JSZip has a 1,000,000 character limit?

Look forward to hearing from you

Daniel

On Wed, Jul 22, 2015 at 1:50 AM, David Duponchel notifications@github.com wrote:

What is your code and what browser do you use ?

— Reply to this email directly or view it on GitHub https://github.com/Stuk/jszip/issues/48#issuecomment-123571221.

dduponchel commented 9 years ago

The project you linked uses a pre-1.0 version (at commit 14597289220eeed7482997a3a1d7161fca834498) from 2012. You should ask this project to update the JSZip dependency. While not a miracle recipe, you should see big improvements. After that, if you still have performance issues, the next step is to check how apex-zip handles data. It seems to use a lot of base64 encode/decode, which takes time and resources.

Regarding the 1,000,000 character limit: the upper bound of how many data you can handle greatly depends on 1/ the browser (but you tested with Firefox and Chrome so that isn't the issue) and 2/ what you ask for. I just tried reading the content of a 60MB text compressed into a 20MB archive loaded as a Blob: if I ask a string it takes ~2s (decoding from utf8 is not free), if I ask a Uint8Array it takes ~700ms. If you add a lot of base64 encode/decode, you will see worse performances. I suspect that your million character limit involves a hidden cost, like base64 or utf8 conversion (or a compression/decompression step, that was hugely improved between v1.0 and v2.5).

drhewitt commented 9 years ago

Thank you for that info.

There are only two files in the app, they are jszip.resource and jszip.resource-meta.xml

How do I go about updating these two files with the most recent JSZip update?

dduponchel commented 9 years ago

I think you should open an issue on apex-zip, it will be better suited for this discussion (I will help you there).

T-vK commented 6 years ago

I am experiencing the same problem I think. I'm using nodejs.

I am trying to generate a a zip file with currently 10031 files (61.5MB). (The code to generate the zip file used to work just fine and never changed. The only thing that changed was the amount of files in the folder to be zipped which went from a few hundred to over 10000.)

It took me forever figure out what the hell is even going on. I've been using the promise API (generateAsync()). But the promise does neither resolve, nor does it ever reject. It just straight up kills the process without showing any sign of failure.

This is really really bad and should never happen. Is that intentional?

How can I make this work?

Edit: The last update I get before the process is killed is always:
{ currentFile: '5065', percent: 45.3294786162895 }

The process exit code is 0 btw.

Although this is probably not too useful, here is the code that I'm using:

const JSZip = require("jszip")
// ...
async function autoGenerate(apkgFile, cmd) {
    const files = await fs.readdir(cmd.tempFolder)
    const apkgArchive = new JSZip()
    const filepathArr = files.map(filename=>`${cmd.tempFolder}/${filename}`)
    for (const filepath of filepathArr) {
        apkgArchive.file(filepath, fs.createReadStream(filepath))
    }

    const content = await apkgArchive.folder(cmd.tempFolder).generateAsync({type:"uint8array"}, console.log)
    // This part of the code is never reached! 
    await fs.writeFile(apkgFile, content)
    return "done!"
}
// ...
autoGenerate(apkgFile, cmd).then(console.log).catch(console.error)
// The promise never gets resolved and also never rejected!

Edit2: I'm on Linux (Fedora 28) using nodejs v10.7.0.

coco-super commented 5 years ago

@T-vK I had the same problem, killing the process without any sign of failure. How did you solve it in the end?

T-vK commented 5 years ago

@coco-super I'm not sure if I ever solved it. If I did then it has to be in here somewhere: https://github.com/FOSS-Chinese/AnkiDeckGenerator/blob/master/index.js#L331

But I probably just used less files to not get the error.

dejoski commented 3 years ago

Is there any way to do the same this as jszip in other languages like python?

nickgrealy commented 3 years ago

Managed to fix the issue by using the https://www.npmjs.com/package/client-zip library.

Stuk / jszip

Using large file jszip dies silently #48