Closed ThaUnknown closed 1 year ago
Actually it would probably be better to make a polyfill for CompressionStream
and DecompressionStream
backed by fflate
than to add support for it here. I can try to work on that in the coming days.
Actually it would probably be better to make a polyfill for
CompressionStream
andDecompressionStream
backed byfflate
than to add support for it here. I can try to work on that in the coming days.
this exposes a better API for files than those streams, so I think just accelerating the lib with them would be better but why not both?
Actually after profiling the performance, it actually tends to be faster just to use fflate
in most cases due to streaming overheads. However if anyone wants a Compression Streams API polyfill using fflate
, I've made one available at https://github.com/101arrowz/compression-streams-polyfill.
Just having a look at DecompressionStream
in Deno.
import { assert } from "https://deno.land/std@0.210.0/assert/mod.ts";
import * as fflate from "npm:fflate";
import * as pako from "npm:pako";
async function decode_stream(stream: ReadableStream) {
const reader = stream
.pipeThrough(new DecompressionStream("gzip"))
.getReader();
let bytes: Uint8Array;
{
const result = await reader.read();
assert(!result.done, "ReadableStream must have data");
bytes = result.value;
}
{
const result = await reader.read();
assert(result.done, "ReadableStream must have no more data");
}
return bytes;
}
const base = new URL(
"https://raw.githubusercontent.com/zarr-developers/zarr_implementations/5dc998ac72/examples/zarr.zr/gzip/.zarray",
);
const BYTES = await fetch(new URL("0.0.0", base))
.then((r) => r.arrayBuffer())
.then((b) => new Uint8Array(b));
const REFERENCE = fflate.gunzipSync(BYTES);
Deno.bench("decode_stream", async () => {
const stream = new ReadableStream({
start(controller) {
controller.enqueue(BYTES);
controller.close();
},
});
const result = await decode_stream(stream);
assert(result.length === REFERENCE.length);
});
Deno.bench("fflate.gunzip", () => {
const result = fflate.gunzipSync(BYTES);
assert(result.length === REFERENCE.length);
});
Deno.bench("pako.inflate", () => {
const result = pako.inflate(BYTES);
assert(result.length === REFERENCE.length);
});
> deno bench -A gzip.ts
cpu: Apple M3 Max
runtime: deno 1.39.1 (aarch64-apple-darwin)
file:///Users/manzt/demos/gzip.ts
benchmark time (avg) iter/s (min … max) p75 p99 p995
------------------------------------------------------------------- -----------------------------
decode_stream 40.66 µs/iter 24,591.2 (36.96 µs … 1.37 ms) 41 µs 62.92 µs 88.58 µs
fflate.gunzip 67.92 µs/iter 14,723.0 (63.83 µs … 1.68 ms) 66.62 µs 82.67 µs 88.25 µs
pako.inflate 112.22 µs/iter 8,910.9 (102.83 µs … 1.4 ms) 109.54 µs 139.58 µs 182.42 µs
summary
decode_stream
1.67x faster than fflate.gunzip
2.76x faster than pako.inflate
I don't think that's a fair comparison as you're wrapping fflate in a ReadableStream, realistically you wouldn't use readable streams for this
I’m not sure i understand. fflate is decompressing BYTES directly, there is no readable stream in an fflate code path. Only the non-fflate bench (decode_stream) creates the readable stream.
Might be worth looking into Deno.bench.
oh sorry, missread
What can't you do right now?
Take advantage of native decompression streams
An optimal solution
detect support for DecompressionStream and use it, fallback to default if not
(How) is this done by other libraries?
to my knowledge only zip-go supports DecompressionStreams and doesn't fall back to anything.
DecompressionStream only supports deflate and gzip, but it might be worth taking advantage of