101arrowz / fflate

High performance (de)compression in an 8kB package
https://101arrowz.github.io/fflate
MIT License
2.27k stars 79 forks source link

Proposal for `close` method and event on streaming interface #122

Closed mayfield closed 2 years ago

mayfield commented 2 years ago

What can't you do right now? Cool library! I'm using the streaming interface a la.,

const gzip = new fflate.Gzip();
let bigTypedArray;
gzip.ondata = (c, isLast) => {
   bigTypedArray = concatArrays(bigTypedArray, c);
   if (isLast) {
       download(bigTypedArray);
       bigTypedArray = null;
   }
};

for (...) {
   gzip.push(data);
   if (bigTypedArray.byteLength >= maxSize) {
       // close the stream and create/download file..
       gzip.push(new Uint8Array(), /*isLast*/ true);
   }
}
if (bigTypedArray && bigTypedArray.byteLength) {
    // close the stream and create/download file..
    gzip.push(new Uint8Array(), /*isLast*/ true);
}

An optimal solution This works fine, but I found it a bit conflated to use just push() and ondata for this sort of process.

A couple thoughts.

  1. Having a close() function that does the same thing as that push(new Uint8Array(), true) would be a nice non api breaking upgrade.
  2. Using a slightly different eventing interface that seperates ondata from onclose. It could work something like this...
    
    const gzip = new fflate.Gzip();
    let bigTypedArray;
    gzip.ondata = c => bigTypedArray = concatArrays(bigTypedArray, c);
    gzip.onclose = () => {
    download(bigTypedArray);
    bigTypedArray = null;
    };

for (...) { gzip.push(data); if (bigTypedArray.byteLength >= maxSize) { gzip.close(); } } if (gzip.isClosed()) { gzip.close(); }



Just some thoughts.  Thanks again for the lib.  Looking forward to integrating it into my browser extension that does bulk data exports from large IndexedDB stores.  https://github.com/SauceLLC/sauce4strava
101arrowz commented 2 years ago

Interesting idea. Honestly I never really liked the streaming API myself but I wanted near API compatibility with Pako, which was the only popular compression library when I made fflate, and the current API delivers. I might consider adding support for .close(), but then again it may be better suited in a wrapper library.

By the way, I do not recommend continually concatenating as you receive data, that will waste a lot of memory and a lot of time. You can push the chunks all into an array and concatenate at the end (this is much faster). However, I actually would instead recommend downloading the data in a stream with StreamSaver.js, I made a guide for ZIP generation on it here. This is very fast and uses very little memory.

mayfield commented 2 years ago

Hi @101arrowz !

I'll have to check out that streaming downloading technique in StreamSaver.js. I'd love to do something like that.

Re: concat; my pseudo code is glossing over the how I actually concat buffers but I do your standard over-allocation technique to prevent having to do too many reallocs. Here's the actual version: https://github.com/SauceLLC/sauce4strava/blob/9c2d9a55522f27cd6f268589282060e9a1e07e41/src/common/base.js#L289-L303

It's kinda of moot though because most of the time spent in my "Backup data" process is spent passing serialized data from the extension background page to the extension content script and then again to the site script context. On chrome the bg -> ext-context is not even a structuredClone if you can believe it! It's essentially JSON.stringify(). So I'm limited by string size limits and JSON.stringify performance more than anything. And sadly as a browser extension I can't use SharedArrayBuffer either! Extension developers are like second class web citizens. :)

101arrowz commented 2 years ago

That's quite strange. An idea (maybe a bad one) for your message passing issue is Base64, which should be moderately fast for binary data (as long as you use a package for it and not btoa, which only works on strings). If you process in streams this strategy is also much faster than JSON.

Anyway I'll let you know if I update fflate with a more ergonomic stream API.