ZJONSSON / node-unzipper

node.js cross-platform unzip using streams
Other
442 stars 116 forks source link

Files extracted from zip are currupt #271

Closed webwake closed 5 months ago

webwake commented 1 year ago

The extracted files are getting corrupts, this started to happen after upgrading from Node 18 to Node.js 19.8.1 on Windows 10 (haven't tested if broader issue)

Below is a code sample to reproduce the issue. Have incorporated the decompress library to demonstrate an non-corrupt file

const fs = require('fs');
const unzipper = require('unzipper');
const { https } = require('follow-redirects');
const crypto = require("crypto");

const decompress = require('decompress');

async function downloadBinariesFromRelease() {
    await new Promise((resolve, reject) => {
        fs.mkdirSync('unzipper', { recursive: true });
        const file = fs.createWriteStream('binaries.zip');
        console.log('Downloading Neutralinojs binaries..');
        https.get("https://github.com/neutralinojs/neutralinojs/releases/download/v4.10.0/neutralinojs-v4.10.0.zip", function (response) {
            response.pipe(file);
            response.on('end', () => {
                resolve();
            });
        });
    });
}

function fileHash(filename, algorithm = 'md5') {
    return new Promise((resolve, reject) => {
        // Algorithm depends on availability of OpenSSL on platform
        // Another algorithms: 'sha1', 'md5', 'sha256', 'sha512' ...
        let shasum = crypto.createHash(algorithm);
        try {
            let s = fs.ReadStream(filename)
            s.on('data', function (data) {
                shasum.update(data)
            })
            // making digest
            s.on('end', function () {
                const hash = shasum.digest('hex')
                return resolve(hash);
            })
        } catch (error) {
            return reject('calc fail');
        }
    });
}

async function extractWithUnzipperLibrary() {
    await new Promise((resolve, reject) => {
        console.log('Extracting using "unzipper" library')
        fs.createReadStream('binaries.zip')
            .pipe(unzipper.Extract({ path: './unzipper' }))
            .promise()
            .then(() => resolve())
            .catch((e) => reject(e));
    });
}

async function extractWithDecompressLibrary() {
    console.log('Extracting using "decompress" library')
    await decompress('binaries.zip', 'decompress');
}

async function main() {
    await downloadBinariesFromRelease();
    await extractWithUnzipperLibrary();
    await extractWithDecompressLibrary();

    console.log(`correct hash:    3bbb562a59a454534f0ced6c801ccdb7`);
    console.log(`unzipper hash:   ${await fileHash("unzipper/neutralino-win_x64.exe", "md5")}`);
    console.log(`decompress hash: ${await fileHash("decompress/neutralino-win_x64.exe", "md5")}`);
}

main();
mvolfik commented 1 year ago

I have another code sample. Unzipping works when you send the zip file as one large chunk, but not when you slice it like this (I discovered this by piping the file from https download, so I suppose these are some transmission window sizes).

Code:

const fs = require("node:fs");
const unzipper = require("unzipper");

const data = fs.readFileSync("a.zip");
const extractor = unzipper.Extract({ path: "./test" });

extractor.write(Uint8Array.prototype.slice.call(data, 0, 1378));
extractor.write(Uint8Array.prototype.slice.call(data, 1378, 1378 * 2));
extractor.write(Uint8Array.prototype.slice.call(data, 1378 * 2, 1378 * 3));
extractor.write(Uint8Array.prototype.slice.call(data, 1378 * 3));

The zipfile: a.zip

As a result, the package.json in output is corrupted like this:

Corrupted file ``` of an Apify actor.", "engines": { "node": ">=16.0.0" }, "dependencies": { "apify": "^3.0.0", "crawlee": "^3.0.0" }, "devDependencies": { "@apify/eslint-config-ts": "^0.2.3", "@apify/tsconfig": "^0.1.0", "@typescript-eslint/eslint-plugin": "^5.55.0", "@typescript-eslint/parser": "^5.55.0", "eslint": "^8.36.0", "ts-node": "^10.9.1", "typescript": "^4.9.5" }, "scripts": { "start": "npm run start:dev", "start:prod": "node dist/main.js", "start:dev": "ts-node-esm -T src/main.ts", "build": "tsc", "lint": "eslint ./src --ext .ts", "lint:fix": "eslint ./src --ext .ts --fix", "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1" }, "author": "It's not you it's me", "license": "ISC" } { "name": "crawlee-cheerio-typescript", "version": "0.0.1", "type": "module", "description": "This is a boilerplate ```

Node: 18.16.0 unzipper: 0.10.11

matej-marcisovsky commented 1 year ago

Downgrading to 18.14.0 fixed issues for me.

didaquis commented 1 year ago

Same situation for me after update Node 16 to Node 18.16.0. If I update from Node 16 to Node 18.15.0 there are not errors!

This is the Node changelog for Node 18.16. https://github.com/nodejs/node/releases/tag/v18.16.0

@ZJONSSON can you help me to identify the cause, please?

didaquis commented 1 year ago

Hi @mvolfik and @webwake.

Can either of you confirm what I say in my previous comment? In my case using Node 18.15.0 no error occurs. However, the files are corrupted if Node 18.16.0 is used.

didaquis commented 1 year ago

May be related to https://github.com/ZJONSSON/node-unzipper/issues/269

Walther commented 1 year ago

Unfortunately, this seems to be reproducible using Node 18.16.0 even with unzipper version 0.10.14, which was released after merging https://github.com/ZJONSSON/node-unzipper/pull/274

d0v3riz commented 1 year ago

Is there an update to fix version 18.16 ?

sounisi5011 commented 1 year ago

Is there an update to fix version 18.16 ?

It probably does not exist. You may want to migrate to another package. A list of candidates I have researched is written here: https://github.com/go-task/go-npm/issues/7#issuecomment-1568567117

thorsent commented 1 year ago

The problem appears to be that unzipper is putting the first block of code in the wrong place.

This is a winmerge comparison of a correctly unzipped file (left) and the results of unzipper on latest node (right). It appears that the very first block from the file has been written to disk later than it should. The resulting file has the same file size but is corrupt. image

ericman314 commented 1 year ago

The problem appears to be that unzipper is putting the first block of code in the wrong place.

Same thing is happening on Node 20.7.0, using unzipper version 0.10.14. In my case it's usually blocks of 2^14 bytes that are being swapped, but it also happens with files much smaller than this.

Would love to see this fixed, as Node 16 has reached end-of-life. There are alternatives, but I really like unzipper's api best, especially the ability to stream archived data without loading it into a buffer first. Very cool!

pixartist commented 1 year ago

Very simple application here, all data corrupted by this library

sovcik commented 11 months ago

Same issue here.

wfairclough commented 11 months ago

Also same issue, if I didn't find this page I would have wasted many more hours debugging.

Lulalaby commented 11 months ago

CR v20.10.0

DaCao commented 11 months ago

same issue here. all data corrupted by this library. Wasted days on this....

please fix it.

kenotron commented 9 months ago

https://www.npmjs.com/package/yauzl#no-streaming-unzip-api - wondering if that's the reason?

samerkassem82 commented 9 months ago

So is this library dead? Same issas as almost a year later

dy-dx commented 8 months ago

The issue has been fixed in the following node.js versions:

But for anyone unable to upgrade, you'll need to switch to a different unzipper library until this gets addressed: https://github.com/ZJONSSON/node-unzipper/issues/261

ZJONSSON commented 5 months ago

Thanks @dy-dx for identifying the underlying issue as a bug with fs.createWriteStream in nodejs. This has now been fixed in node per https://github.com/ZJONSSON/node-unzipper/issues/271#issuecomment-2021223739

As far as extract goes we have moved from unmaintained fstream to fs-extra in a newly published version (v0.12.1)