101arrowz / fflate

High performance (de)compression in an 8kB package
https://101arrowz.github.io/fflate
MIT License
2.27k stars 79 forks source link

how can I replace a file in a zip file? #91

Closed mankeheaven closed 3 years ago

mankeheaven commented 3 years ago

In my case, I need to replace a file in a zip file For example, in my zip file a.zip, it has 2 files, like a.txt, b.txt. I only want to replace a.txt's content, or remove a.txt, and I don't want to unzip all files in memory And I need to use as little memory as possible,cause i have a big zip file, and the memory is not cheap in my case By now, I zip all files in 4 s with 700m,and I have no way to replace serveral files with little memory and time. Can you help me with some ideas?

101arrowz commented 3 years ago

Unfortunately it will be very difficult to do this quickly in any ZIP tool. When you add or remove a file to a ZIP archive, or when you change the contents of any entry, the ZIP file's footer must be rewritten and the size of the file changes, meaning the contents must be written into a new buffer.

In theory you can avoid is decompressing files you don't need. This is unfortunately not an optimized aspect of fflate's API - in theory you can get the CRC32 checksum of the decompressed data in a ZIP file without actually decompressing, but fflate does not expose this because the added logic would be incredibly complex due to intricacies in the ZIP format, even though it seems like it just involves reading a header. Therefore you're forced to decompress.

Even though it's a bit slow, you can do this with low memory usage:

import { Zip, ZipPassThrough, Unzip, UnzipInflate, EncodeUTF8 } from 'fflate';

const zip = new Zip((err, data, final) => {
  // push data to your output stream, e.g. a file
});

const unzip = new Unzip(file => {
  if (file.name != 'a.txt') {
    const entry = new ZipPassThrough(file.name);
    file.ondata = (err, data, final) => {
      if (err) {
        // handle error
      } else {
        entry.push(data, final);
      }
    }
    zip.add(entry);
    // Will pipe it through the fake decompressor
    file.start();
  } else {
    const entry = new ZipPassThrough('a.txt');
    const encoder = new EncodeUTF8((data, final) => entry.push(data, final));
    zip.add(entry);
    encoder.push('some data', false);
    encoder.push('\nend of data', true);
  }
});
unzip.register(UnzipInflate);

// Repeatedly push chunks in your ZIP archive to unzip with:
// unzip.push(chunk1, false);
// unzip.push(chunk2, false);
// unzip.push(chunkLast, true);
// After you're done:
// zip.end();

You mentioned that you had already zipped the files and wanted to mutate it later, could you just postpone zipping until finalized?

mankeheaven commented 3 years ago

Thank you for your help. My situation is that: I need to backup a zip every 2 minutes, but something else like map, it‘s’ real-time rendering in electron. When I load lots of resources in map, It takes a lot of memory in electron renderer process, and I need to backup the ${name}.bak.zip(may be 1GB) in electron main process, so it must takes memory as small as possible, otherwise, it will be out of memory. And the resources of the zip depends on the diff json, it's a little complicated

By now, I zip all file every 2 mimutes, like your demo, It takes a small memory

const zip = new fflate.Zip((err, dat, final) => {
  if (!err) {
    // output of the streams
    console.log(dat, final);
  }
});

// ZipPassThrough is like ZipDeflate with level 0, but allows for tree shaking
const nonStreamingFile = new fflate.ZipPassThrough('test.png');
zip.add(nonStreamingFile);
// If you have data already loaded, just .push(data, true)
nonStreamingFile.push(pngData, true);

// You need to call .end() after finishing
// This ensures the ZIP is valid
zip.end();

Maybe it can not resolve with little time and memory to reuse the zip file I can only zip this every 2 mimutes with small memory first

Thank you very much.