101arrowz / fflate

High performance (de)compression in an 8kB package
https://101arrowz.github.io/fflate
MIT License
2.27k stars 79 forks source link

Deflate stream produces way bigger outputs than Pako #166

Closed BenoitZugmeyer closed 1 year ago

BenoitZugmeyer commented 1 year ago

How to reproduce

import {Deflate as PakoDeflate, inflateRaw} from "https://unpkg.com/pako@2.1.0/dist/pako.esm.mjs"
import {Deflate as FflateDeflate} from "https://unpkg.com/fflate@0.7.4/esm/browser.js"

const data = new TextEncoder().encode(Array.from({ length: 1000 }, (_, i) => i).join(","))

const fflateChunks = []
const fflateDeflate = new FflateDeflate(chunk => {
  fflateChunks.push(chunk)
})
fflateDeflate.push(data)
fflateDeflate.push(data)
fflateDeflate.push(data, true)
const fflateResult = new Uint8Array(fflateChunks.reduce((total, chunk) => total + chunk.byteLength, 0))
{
  let offset = 0;
  for (const chunk of fflateChunks) {
    fflateResult.set(chunk, offset);
    offset += chunk.byteLength;
  }
}

const pakoDeflate = new PakoDeflate({ raw: true })
pakoDeflate.push(data)
pakoDeflate.push(data)
pakoDeflate.push(data, true)
const pakoResult = pakoDeflate.result

// sanity check
if (inflateRaw(pakoResult).byteLength !== inflateRaw(fflateResult).byteLength) {
  throw new Error("Results don't match")
}

console.log("Pako result length:  ", pakoResult.byteLength)
console.log("Fflate result length:", fflateResult.byteLength)

// Pako result length:   1916
// Fflate result length: 5354

The problem

fflate produces a much bigger output than pako. Contrary to pako, fflate does not share the deflate state for the whole stream, so previously pushed chunk aren't taken into account when pushing a new chunk.

101arrowz commented 1 year ago

This design was chosen to make every pushed chunk correspond to one or more chunks in the deflate stream. But as you mentioned, it's inefficient for several small chunks. It's still relatively effective for chunks of around 1MB in size (e.g. the ones returned from File.prototype.stream, but upon reconsideration this use case is probably important to support as well.

This might be possible to resolve by preserving a 32kB lookback buffer - I'll see how difficult it is to implement.

101arrowz commented 1 year ago

I've successfully implemented this and will push it out in a release sometime soon.

101arrowz commented 1 year ago

Fixed in v0.8.0.