cthackers / adm-zip

A Javascript implementation of zip for nodejs. Allows user to create or extract zip files both in memory or to/from disk
MIT License
2.05k stars 375 forks source link

Invalid or unsupported zip format. No END header found #268

Open jerrygreen opened 5 years ago

jerrygreen commented 5 years ago

Error:

Jerrys-MacBook-Pro:client jerrygreen$ node zip.js 

/Users/jerrygreen/my_project/node_modules/adm-zip/zipFile.js:66
            throw Utils.Errors.INVALID_FORMAT;
            ^
Invalid or unsupported zip format. No END header found

My (simple) code:

const AdmZip = require('adm-zip')
const zip = new AdmZip('./my_file.zip')

I'm using Macos 10.14.1 By opening it from Finder (using default Archive Utility app) it's unzipping nicely, no problems

AzureDoom commented 5 years ago

Did you ever find a fix for this?

jerrygreen commented 5 years ago

@AzureZhen I've found macos is using an util called ditto to zip/extract something. It's a default macos app – so I simply used this for extraction, works perfectly

ghost commented 5 years ago

@JerryGreen , I am facing same issue. How do you fix it by using ditto? could you please attach sample code here?

Many thanks.

jerrygreen commented 5 years ago

@kangwen6663

ditto -xk /path/from /path/to
5saviahv commented 5 years ago

This error is thrown cases when file comment field exceeds maximum 65k size. I have seen it with some external signing schemes.

csalmeida commented 3 years ago

Anyone still experiencing the issue? Would appreciate a solution that is not dependent on ditto if possible.

const zipPath = `./temp/file.zip`;
const zip = new AdmZip(zipPath);
zip.extractAllTo(`./temp/`, true);

The terminal output is:

Error: Invalid or unsupported zip format. No END header found
    at readMainHeader (/Users/username/Projects/example-project/node_modules/adm-zip/zipFile.js:107:10)
    at new module.exports (/Users/username/Projects/example-project/node_modules/adm-zip/zipFile.js:19:3)
    at new module.exports (/Users/username/Projects/example-project/node_modules/adm-zip/adm-zip.js:20:11)
    at module.exports._installWordpress (/Users/username/Projects/example-project/generators/app/index.js:102:17)
    at module.exports.writing (/Users/username/Projects/example-project/generators/app/index.js:45:12)
    at Object.<anonymous> (/Users/username/Projects/example-project/node_modules/yeoman-generator/lib/index.js:399:25)
    at /Users/username/Projects/example-project/node_modules/run-async/index.js:49:25
    at new Promise (<anonymous>)
    at /Users/username/Projects/example-project/node_modules/run-async/index.js:26:19
    at /Users/username/Projects/example-project/node_modules/yeoman-generator/lib/index.js:400:11
5saviahv commented 3 years ago

Are you able extract or view your file with archive managers like 7zip, WinRAR etc. ?

csalmeida commented 3 years ago

Are you able extract or view your file with archive managers like 7zip, WinRAR etc. ?

@5saviahv Yes, the file seems to be okay and extracts with these tools I've tried:

I also seem to only get this error sometimes, not all the time (using the same file) which makes it slightly more challenging to understand the cause.

Thanks!

5saviahv commented 3 years ago

Interesting, there maybe many culprits, but in way you describe it, it seems like race condition (two or more process wants access the file at same time)

csalmeida commented 3 years ago

Interesting, there maybe many culprits, but in way you describe it, it seems like race condition (two or more process wants access the file at same time)

  • Do you only read from file or you also write in this file ?
  • Do you open file multiple times ?
const zipPath = `./temp/file.zip`;
const zip1 = new AdmZip(zipPath);
const zip2 = new AdmZip(zipPath);
  • Do you use async functions ?
  • It fails on which OS ?

Thanks for getting back to me on this. A race condition could be the case, after the file is extracted the contents are copied and then the original zip file and extracted folder are removed.

Do you use async functions?

I am unsure but I have provided an example below.

It fails on which OS?

It fails when running the script on Node v15.2.1 or v14.15.1 running on macOS Big Sur 11.0.1 (20B29).

Here's the function I have:

_extractZip(projectName, fileZipName, copyPath=null) {
  const extractedFolder = `./${projectName}/${fileZipName.replace('.zip', '')}`;

  // Extracts contents of zip file.
  const extractPath = `./${projectName}/${fileZipName}`;
  const zip = new AdmZip(extractPath);
  zip.extractAllTo(`${projectName}/`, true);

  let extractError = null;

  // If a copy path is not provided files won't be moved.
  if (copyPath) {
    fse.copy(extractedFolder, copyPath, { overwrite: true }, err => {

      if (err)  {
        extractError = `
        Could not copy files to ./${copyPath}. \n
        ./${err}
        `
      } else {
        // Cleans up by removing extracted folder and zip.
        try {
          fs.rmdirSync(extractedFolder, { recursive: true });
        } catch (err) {
            extractError = `
            Could not remove extractedFolder. \n
            ./${err}
            `
        }

        // Remove zip file as it is not longer needed.
        try {
          fs.unlinkSync(extractPath);
        } catch (error) {
          extractError = `
          Could not remove ./${extractPath}. \n
          ./${err}
          `
        }
      }
    });
  } else {
    // Cleans up by removing extracted folder and zip.
    // Lets user know that program did not work as intended.
    try {
      fs.rmdirSync(extractedFolder, { recursive: true });
    } catch (err) {
        extractError = `
        Could not remove extractedFolder. \n
        ./${err}
        `
    }

    // Remove zip file as it is not longer needed.
    try {
      fs.unlinkSync(extractPath);
    } catch (error) {
      extractError = `
      Could not remove ./${extractPath}. \n
      ./${err}
      `
    }

    this.log(`${chalk.red('Error:')} Could not copy files (copyPath is not present).
    Zip file and extracted files were removed.`);
  }
}

Thanks again for looking into it. If this is a race condition is there a way to only access the file when it is done extracting?

5saviahv commented 3 years ago

Code seems ok. It should not give any trouble.

5saviahv commented 3 years ago

How big your files are ? I mean aren't any of them Zip64 ? Many archive managers switch Zip64 if you use big files or you have many files. Adm-zip can read zip64 files but it has higher chance for fail.

csalmeida commented 3 years ago

@5saviahv thank you, for some reason I cannot replicate the error lately. I am not sure whether it is Zip64 but one of the files I have used this function on is wordpress.zip.

Since this is so intermittent (on that same file) I am unsure what might be cause it but I haven't been getting the error lately. I really appreciate your help, I will comment here again if it returns and will add details. 🙏

iget-master commented 3 years ago

This happened to me once, but when retrying (same code, same file) it works. Weird.

nimmc commented 3 years ago

I have this same issue. I think I can somehow replicate it.

I download the file from my aws-s3 then use adm to unzip then use it with cheerio.

The trick is I need to leave my computer alone for like 5-10 minutes and run my code and it will sometime (around 40% of the time) give the error "Invalid or unsupported zip format. No END header found".

But otherwise it will work fine. The file is epub always same file so it this file is usable.

Here is my code.

getfile();
async function getfile(){
try {
aws.config.update({
accessKeyId: accessKeyId,
secretAccessKey: secretAccesskey,
region: 'us-east-2'
});

  var s3 = new aws.S3();

  var params = {
    Bucket: 'original', 
    Key: 'file.epub'
  };
  let readStream = s3.getObject(params).createReadStream();
  let writeStream = fs.createWriteStream(path.join(__dirname, params.Key)); 
  readStream.pipe(writeStream);
  readStream.on('end', () => { 
    console.log("this ends")
    console.log("paramkey = ",params.Key)  
    writeStream.end(); 
    epubToText(params.Key);
  })
} catch (error) {
console.log("error = ",error)
}
}

async function epubToText(path2) {

try {
console.log("path2 = ",path2) // always already exists
console.log("111111")
let zip = new AdmZip(path2); // it stops here. In console it only logged "111111" and not "22222"

console.log("22222")

let $ = cheerio.load( zip.readAsText('META-INF/container.xml'),{xmlMode:true, decodeEntities: false} );
console.log("$ = ",$)

let contentOpfPath = $("container rootfiles rootfile").attr("full-path");
console.log("contentOpfPath = ",contentOpfPath)

let contentOpfFolder = contentOpfPath.split("/")
console.log("contentOpfFolder = ",contentOpfFolder)

} catch(err) {
console.log(err);
}

I can't let this happen in production though. The file must be processed and served to customer. This file is 6.5 Mb I use node 12.16.1 on windows 7 and on my mac bigsur this happens too.

krisrefs commented 2 years ago

I made a workaround for this as I was getting the zip file externally and then saved locally to extract.

A setTimeout solved my issue.

    const saveZIPFile = async () => {
        return new Promise((resolve) => {
            data.body.pipe(fs.createWriteStream(path.resolve(__dirname, `${project}.zip`)));

            data.body.on('end', () => {
                setTimeout(() => {
                    resolve();
                }, 1000);
            });
        });
    };

    await saveZIPFile();

    var zip = new AdmZip(path.resolve(__dirname, `${project}.zip`));

    zip.extractAllTo(path), true);
nhuethmayr commented 2 years ago

I had the same problem and it turned out to be an issue with how I download the file. I never waited for the download to complete before attempting to unzip it.

Solution: Properly await the download and only then start working with the ZIP file.

LuizAsFight commented 1 year ago

I had this problem, and turns out the URL was returning a http code 302 (redirect), instead of a 200 (success). then my zip file was getting 0 bytes.

to fix that I changed the code a bit:

const url =
      'http://blablabla.zip';

    const zipFile = './blablabla.zip';
    const zipFileStream = fs.createWriteStream(zipFile);

    function downloadFile(url, attempt = 1) {
      return new Promise((resolve, reject) => {
        https
          .get(url, (res) => {
            if (res.statusCode === 302 || res.statusCode === 301) {
              if (attempt > 5) {
                // prevent infinite loops if there's a redirect loop
                reject(new Error('Too many redirects'));
                return;
              }

              const newUrl = res.headers.location;
              console.log(`Redirecting to: ${newUrl}`);
              downloadFile(newUrl, attempt + 1).then(resolve, reject);
              return;
            }

            if (res.statusCode !== 200) {
              reject(new Error(`Unexpected status code: ${res.statusCode}`));
              return;
            }

            res.pipe(zipFileStream);

            zipFileStream.on('finish', () => {
              zipFileStream.close(resolve);
            });

            zipFileStream.on('error', (error) => {
              reject(error);
            });
          })
          .on('error', (error) => {
            reject(error);
          });
      });
    }

    await downloadFile(extensionUrl);

    // eslint-disable-next-line no-console
    console.log('Download Completed extracting zip...');
    const zip = new admZip(zipFile); // eslint-disable-line new-cap
    zip.extractAllTo('./blablabla', true);
    // eslint-disable-next-line no-console
    console.log('zip extracted');
panoply commented 10 months ago

My issue here was that I passed .DS_Store when attempting to unzip. Ensure you're filtering out invalid file paths.

LukeSavefrogs commented 7 months ago

I had the same problem downloading a file using axios and fs.createWriteStream.

I solved by waiting on the writeStream' close event and then resolving the Promise:

import os from 'os'
import fs from 'fs';
import path from 'path';

import axios from 'axios';
import AdmZip from 'adm-zip';

async function download(url: string): Promise<string> {
    const outputFile = path.join(os.tmpdir(), 'archive.zip');
    const { data } = await axios.get(url, { responseType: 'stream' });

    // Pipe the data to a file
    const writeStream = fs.createWriteStream(outputFile);
    data.pipe(writeStream);

    // Return a promise and resolve when download finishes
    return new Promise((resolve, reject) => {
        data.on('error', () => {
            reject(`Failure while retrieving remote data (source: ${downloadURL})`);
        })

        writeStream.on('close', () => {
            resolve(outputFile);
        })
        writeStream.on('error', err => {
            reject(err);
        })
    })
}

async function extract(url: string, outputDir: string) {
    // 1. Download the zip file
    const file = await download(url);

    // 2. Extract the archive
    const zip = new AdmZip(file);
    zip.extractAllTo(outputDir, /*overwrite*/ true);

    return outputDir;
}

extract("https://example.com/archive.zip", path.join(os.tmpdir(), 'extracted')).catch((error) => {
    console.error(error);
});
gsaukov commented 3 months ago

In my case it was my fault with curl and the way I was passing my binary. I should have passed binary @/Users/gsaukov/Downloads/index.zip but i was passing path /Users/gsaukov/Downloads/index.zip as string. Correct binary curl version (with @) below:

curl -X PUT \
  -H 'Content-Type: application/zip' \
  -H 'accept: application/json' \
  --insecure \
  --data-binary @/Users/gsaukov/Downloads/index.zip \
  https://localhost:3000/artifact
Ericfreespirit commented 3 months ago

I found the problem for me: Wrong ❌:

        const outputFilePath = "./ExtractTextInfoFromPDF.zip";
        console.log(`Saving asset at ${outputFilePath}`);

        const writeStream = fs.createWriteStream(outputFilePath);
        streamAsset.readStream.pipe(writeStream);

        let zip = new AdmZip(outputFilePath); // Wrong  ❌
        let jsondata = zip.readAsText('structuredData.json');
        let data = JSON.parse(jsondata);
        console.log("data", data);
        data.elements.forEach((element: any) => {
            if (element.Path.endsWith('/H1')) {
                console.log(element.Text);
            }
        });

Right ✅:

        const outputFilePath = "./ExtractTextInfoFromPDF.zip";
        let zip = new AdmZip(outputFilePath); // Right ✅ !!!!
        console.log(`Saving asset at ${outputFilePath}`);

        const writeStream = fs.createWriteStream(outputFilePath);
        streamAsset.readStream.pipe(writeStream);

        let jsondata = zip.readAsText('structuredData.json');
        let data = JSON.parse(jsondata);
        console.log("data", data);
        data.elements.forEach((element: any) => {
            if (element.Path.endsWith('/H1')) {
                console.log(element.Text);
            }
        });

I think, we have to create a new AdmZip() before writing in it

maybe it's why i get a INVALID_FORMAT() in the adm-zip/zipFile.js