Implement multi-threaded decompression

pRizz commented 6 years ago

Feature request ^

addaleax commented 6 years ago

Hi! This should actually be supported already:

https://github.com/addaleax/lzma-native#multi-threaded-encoding

pRizz commented 6 years ago

I tried it with threads: 0, but noticed no performance increase. Also the doc says it’s only supported for encoding, which sounds like it’s not supported for decoding?

addaleax commented 6 years ago

I tried it with threads: 0, but noticed no performance increase.

Can you share the example with which you tried? If it’s a small-ish file you’re compressing, that might be the case because multi-threading comes with a certain overhead and liblzma only actually makes use of multi-threading for larger chunk sizes.

Also the doc says it’s only supported for encoding, which sounds like it’s not supported for decoding?

Yes, that’s correct. Decoding is much faster than encoding anyway, though.

pRizz commented 6 years ago

I am mainly interested in decompression/decoding at the moment, of a 2.6GB file, which takes about 6 minutes on my laptop. Looking at my cpu usage, it is only utilizing one core, whereas I have 8 available. Therefore I expect decompression to take roughly 45 seconds if it is perfectly spread across cores/threads.

Here is some prototype code, taken from https://github.com/nano-wallet-company/nano-wallet-desktop

const lzma = require('lzma-native');
const tar = require('tar-fs');
const tarStream = require('tar-stream');
const progressStream = require('progress-stream');
const Promise = require('bluebird');
const pump = Promise.promisify(require('pump'));
const fs = Promise.promisifyAll(require('graceful-fs'), {
  filter(name) {
    return ['stat', 'readFile'].includes(name);
  },
});

const createProgressStream = (length, onProgress) => {
  const progress = progressStream({ length, time: 250 });
  progress.on('progress', ({ percentage = 0 }) => onProgress(percentage / 100));
  return progress;
};

const extractAsset = async (savePath, extractDir, onProgress) => {
  const extract = tarStream.extract();
  const { size } = await fs.statAsync(savePath);
  const dec = lzma.createDecompressor({ threads: 0 })
  return pump(
    fs.createReadStream(savePath),
    createProgressStream(size, onProgress),
    dec,
    tar.extract(extractDir, {
      fs,
      extract,
      fmode: 0o600,
      dmode: 0o700,
    }),
  );
};

console.log('Extracting')

extractAsset('./data.tar.xz', '.', function() {
  console.log(arguments)
})

Where data.tar.xz comes from https://dkl5m4kebds7n.cloudfront.net/data.tar.xz, a database file, essentially.

addaleax commented 6 years ago

I agree that this would be great to have, but the underlying library doesn’t seem to support this yet. :confused:

pRizz commented 6 years ago

Got it, thanks.

addaleax / lzma-native

Implement multi-threaded decompression #64