ZJONSSON / node-unzipper

node.js cross-platform unzip using streams
Other
424 stars 114 forks source link

Is there a way to pause reading entries the unzipper stream? #297

Closed mykhailoklym94 closed 2 months ago

mykhailoklym94 commented 3 months ago

Hi @ZJONSSON, thanks for the great library!

I was wondering if there is a way to somehow pause the processing of the zip for a period of time. I will provide an example to make it clear.

Let's say I have a zip file with the following structure:

├── file-1.json
├── file-2.json
└── file-3.json

My goal is to process these files sequentially, here is the general algorithm that I want to achieve:

  1. Start zip processing
  2. Buffer in memory file-1.json
  3. Pause zip processing - e.g. don't load more new files into memory
  4. Application logic will process file-1.json - let's say it takes around 20-30 seconds
  5. Resume zip processing
  6. Repeat steps 2-5 for every file-2.json and file-3.json.

Here is a contrived example in code:

const unzipper = require("unzipper");
const fs = require("node:fs");

console.log(process.pid)

const main = async () => {
  const readableStream = fs.createReadStream("./example.zip");
  const unzipperStream = unzipper.Parse();
  readableStream.pipe(unzipperStream);

  unzipperStream.on('entry', async entry => {
    const fileName = entry.path;

    if (fileName === "file-1.json") {
      // specific processing logic for file-1
      entry.on("finish", () => {
        // ! once entry has been loaded into memory I want to pause the stream so I don't load othert files into memory
        // ! this doesn't work, it's just an example of what I want to achieve.
        readableStream.pause();
      });

      const file1 = await entry.buffer();
      console.log("successfully buffered the file - now will start processing, it will take some time", file1);
      setTimeout(() => {
        // let's say it will take 20 seconds to process the file
        // now I want to start reading next file in zip
        readableStream.resume();
      }, 20000);

    } else if (fileName === "file-2.json") {
       // specific processing logic for file-2
      entry.on("finish", () => {
        // ! once entry has been loaded into memory I want to pause the stream so I don't load other files into memory
        // ! this doesn't work, it's just an example of what I want to achieve.
        readableStream.pause();
      });

      const file2 = await entry.buffer()

      console.log("successfully buffered the file - now will start processing, it will take some time", file2);
      setTimeout(() => {
        // let's say it will take 20 seconds to process the file
        // now I want to start reading next file in zip
        readableStream.resume();
      }, 20000);

    }
    else {
      entry.autodrain();
    }

  })

}

main();

Explanation for the code: What I am trying to achieve is to buffer the file into memory, pause unziper for some time until I process the file. I tried to pause the incoming stream, but apparently, it doesn't work and unzipper keeps loading data into memory.

ZJONSSON commented 2 months ago

The best way to do this is first to use the Open method to open the zip file to get the central directory, and then you can open individual entries at your leisure. If you want to do this with the .pipe method, then you can simply stop consuming for a bit (and the stream pauses on its own).

Here is a rudimentary example of how you can pause inside on('entry if you are still processing previous file:

let isProcessing;

on('entry', async entry => {
  await isProcessing;
  const data = await entry.buffer();
  isProcessing = process(data);
  const results = await isProcessing;
  isProcessing = undefined;
}
ZJONSSON commented 2 months ago

This would strangely work because .on('entry happens sequentially as the zip file is read sequentially. So we should never have multiple isProcessing set at any given time