101arrowz / fflate

High performance (de)compression in an 8kB package
https://101arrowz.github.io/fflate
MIT License
2.27k stars 79 forks source link

Push chunk result not good #46

Closed xqdoo00o closed 3 years ago

xqdoo00o commented 3 years ago

How to reproduce

<script src="https://unpkg.com/fflate"></script>
<script src="https://cdn.jsdelivr.net/npm/pako@2.0.3/dist/pako.js"></script>
<script>
    let massiveFileBuf;
    fetch('/f.html').then(
        res => {
            res.arrayBuffer().then(e => {
                massiveFileBuf = new Uint8Array(e);
            })
        });
    let resLen = 0
    let startfflate = () => {
        let ins = new fflate.Deflate({
            level: 6
        });
        ins.ondata = function (data, final) {
            resLen += data.length
        }
        ins.push(massiveFileBuf.slice(0), true);
        console.log("full Deflate length",resLen);
        resLen = 0;

        let ins2 = new fflate.Deflate({
            level: 6
        });
        ins2.ondata = function (data, final) {
            resLen += data.length
        }
        let offset = 0;
        let len = massiveFileBuf.length;
        while (offset < len) {
            ins2.push(massiveFileBuf.slice(offset, offset + 16), false);
            offset += 16;
        }
        ins2.push(new Uint8Array(),true);
        console.log("stream Deflate length",resLen);
    }
    let startpako = () => {
        let ins = new pako.Deflate({
            level: 6
        });
        ins.onData = function (data) {
            resLen += data.length
        }
        ins.push(massiveFileBuf.slice(0), true);
        console.log("full Deflate length",resLen);
        resLen = 0;

        let ins2 = new pako.Deflate({
            level: 6
        });
        ins2.onData = function (data) {
            resLen += data.length
        }
        let offset = 0;
        let len = massiveFileBuf.length;
        while (offset < len) {
            ins2.push(massiveFileBuf.slice(offset, offset + 16), false);
            offset += 16;
        }
        ins2.push(new Uint8Array(),true);
        console.log("stream Deflate length",resLen);
    }
</script>

The problem As I tested, the console shows:

startfflate()
b.html:20 full Deflate length 755300
b.html:36 stream Deflate length 10362697

startpako()
b.html:46 full Deflate length 739044
b.html:62 stream Deflate length 739044

The fflate deflate stream result too big and even bigger than the origin data length which is 7832702. So I wonder if the Deflate stream push function has some logic error.

101arrowz commented 3 years ago

This is intended behavior. fflate does not use a temporary buffer during compression, so an input chunks maps to an output chunk in the DEFLATE stream, always. In other words, each push() carries a 5-byte overhead. Moreover, it's nearly impossible to compress a 16 byte chunk effectively.

The solution is to store an array of Uint8Array chunks, then concatenate and push when you reach a reasonable block size (say 64kB). In this test case, just use larger chunks than 16 bytes. Hope this makes sense! Let me know if you have any other questions or if I can close the issue.

xqdoo00o commented 3 years ago

OK, I know. no other questions

xqdoo00o commented 3 years ago

Sorry, but I finally thought this could be a bug. In my case, the chunk size is not controllable, we can't ask our customer to change chunk size to suit the product.😂

101arrowz commented 3 years ago

Maybe you could create a wrapper class and serve this to your client instead?

class Deflate extends fflate.Deflate {
  _chunks = [];
  _chunkSize = 0;
  push(data, final) {
    this._chunks.push(data);
    const newChunkSize = this._chunkSize + data.length;
    if (newChunkSize > 16384 || final) {
      let buf = data;
      if (this._chunkSize) {
        buf = new Uint8Array(newChunkSize);
        let offset = 0;
        for (const chunk of this._chunks) {
          buf.set(chunk, offset);
          offset += chunk.length;
        }
      }
      super.push(buf, final);
      this._chunks = [];
      this._chunkSize = 0;
    }
  }
}
xqdoo00o commented 3 years ago

this could be a resolution, the same file console result is stream Deflate length 928998,still 30% bigger than normal size but it could as fast as normal

101arrowz commented 3 years ago

Sorry, I made a mistake in the above code. This is the code you should use for maximum performance and good compression ratio.

class Deflate extends fflate.Deflate {
  _chunks = [];
  _chunkSize = 0;
  push(data, final) {
    this._chunks.push(data);
    const newChunkSize = this._chunkSize + data.length;
    if (newChunkSize > 262143 || final) {
      let buf = data;
      if (this._chunkSize) {
        buf = new Uint8Array(newChunkSize);
        let offset = 0;
        for (const chunk of this._chunks) {
          buf.set(chunk, offset);
          offset += chunk.length;
        }
      }
      super.push(buf, final);
      this._chunks = [];
      this._chunkSize = 0;
    } else this._chunkSize = newChunkSize;
  }
}
101arrowz commented 3 years ago

@xqdoo00o I'm going to close this for now, since the above code fixes your problem (let me know if it doesn't, it works on my end).