OneIdentity / zstd-js

MIT License
30 stars 2 forks source link

ZstdSimple is not compatible with Zstandard #13

Open pmm-motif opened 11 months ago

pmm-motif commented 11 months ago

I am unable to decompress valid Zstandard data using ZstdSimple. The same contents works just fine using ZstdStream. See the sample code below for a (pretty minimal example). Unfortunately, ZstdStream is prohibitively slow (two orders of magnitude slower than ZstdSimple), which impacts the larger files.

Output compressed by ZstdSimple can be decompressed quickly by ZstdSimple, but it fails when decompressing using Zstandard CLI (seems like trailing zeroes are attached and need to be removed, memory alignment issue?).

// I'm using WASM here, but the same applies to ASM version
import { ZstdCodec, ZstdInit } from '@oneidentity/zstd-js/wasm';

/**
 * This is a an outcome of `zstd -3 -o foo.zstd < foo` where foo contains
 * 'abcdefgh\b'
 *
 * (zstd coming from Zstandard CLI v1.5.5)
 */
const zstdData = new Uint8Array([
  40, 181, 47, 253, 4, 88, 73, 0, 0, 97, 98, 99, 100, 101, 102, 103, 104, 10,
  145, 37, 104, 134,
]);

function checkContents(decompressedBuf: Uint8Array) {
  const text = new TextDecoder('utf-8').decode(decompressedBuf);
  if (text !== 'abcdefgh\n') {
    throw new Error('Text does not match');
  }
  console.log('text matches:', text);
}

async function example() {
  ZstdInit().then(({ ZstdStream, ZstdSimple }: ZstdCodec) => {
    const streamBuf = ZstdStream.decompress(zstdData);
    checkContents(streamBuf);

    // This fails with: ZSTD_ERROR: Error (generic),  error code: -1
    const simpleBuf = ZstdSimple.decompress(zstdData);
    checkContents(simpleBuf);
  });
}
SoGCuicui commented 5 months ago

I also have this issue. I'm using the zstd-v1.5.6-win64 exe (from https://github.com/facebook/zstd/releases of course), and always have 18 extra zero bytes. I can remove them programmatically, but that's definitly not ideal.

mgorven commented 4 months ago

I think this has to do with zstdFrameHeaderSizeMax which is 18: https://github.com/OneIdentity/zstd-js/blob/main/src/components/common/zstd-simple/zstd-simple-dec.ts#L4 When compressing this is added to payload.byteLength and used as srcSize provided to ZSTD_compress: https://github.com/OneIdentity/zstd-js/blob/main/src/components/common/zstd-simple/zstd-simple.ts#L24. This doesn't make sense; it means that ZSTD_compress is overreading the source buffer, which explains the additional null bytes. When decompressing the result is truncated by this amount: https://github.com/OneIdentity/zstd-js/blob/main/src/components/common/zstd-simple/zstd-simple-dec.ts#L28

SoGCuicui commented 4 months ago

Thank you so much! Adding zstd.zstdFrameHeaderSizeMax = 0; before zstd.compress(data); solved this issue.

fuweichin commented 4 months ago

When decompressed zstd content with Chrome 123, that 18 extra bytes cause syntax error for .js files.

SyntaxError: Invalid or unexpected token

Also note the npm package is now 64MB uncompressed. Nobody cares?

RPGillespie6 commented 1 month ago

If you have a zstd cli compressed blob, you can workaround on the JS side with:

ZstdSimple.zstdFrameHeaderSizeMax = 0;
const decompressedSimpleData = ZstdSimple.decompress(compressed);