Daninet / hash-wasm

Lightning fast hash functions using hand-tuned WebAssembly binaries
https://npmjs.com/package/hash-wasm
Other
858 stars 49 forks source link

WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance #51

Closed pavel-vhive closed 10 months ago

pavel-vhive commented 11 months ago

Hi, I got this error: WebAssembly.instantiate(): Out of memory: Cannot allocate Wasm memory for new instance, while trying to calculate MD5 hashes of files in web workers in multiple tabs.

Here is my code snippet:

import { md5 } from 'hash-wasm'; onmessage = async (event) => { const files = event.data; console.log(worker received ${files.length} files); for (let i = 0; i < files.length; i++) { const file = files[i]; const buffer = await file.arrayBuffer(); const hash = await md5(new Uint8Array(buffer)); postMessage([hash, file]); } }; <code>

This works as expected while running a single application (single tab), it creates several web workers and calculates hashes in parallel without any issue, but when several instances of the same application are running (several tabs), I got that error. Please help me to figure out how to solve it or avoid, thank you.

Daninet commented 11 months ago

How many web workers do you have? Could you provide a link with an example that I could try? Does it work with createMD5() API?

pavel-vhive commented 11 months ago

Hi, thanks for a quick reply! The number of workers depends on the number of CPU cores. The code is simple as I provided, unfortunately it taken from a large application and I can't share the whole code. I didn't try to use createMD5() API, is it something different from md5 API? What is the main cause of my issue? Is it related to a parallel computation or the fact I run several tabs?

Daninet commented 11 months ago

Your provided code seems to be correct. md5() already has a mutex that should prevent creating too many wasm instances. So I need to reproduce the issue to understand it better. There is an example in the readme about createSHA1(). createMD5() works similarly.

pavel-vhive commented 11 months ago

Hi, I can send you a lean angular project if you wish

Daninet commented 11 months ago

That would be nice.

pavel-vhive commented 11 months ago
Daninet commented 11 months ago

Thank you. Could you give me some instructions on how to reproduce it? I tried hashing hundreds of files with it but I see no errors.

image
pavel-vhive commented 11 months ago

Hi, the issue reproduced only if you open multiple browser tabs and trying to upload in each tab hundreds of files

Daninet commented 11 months ago

With the file.arrayBuffer() call you load whole files into the RAM. Multiplying that over multiple tabs it can surely eat the RAM. What happens if you return the first byte from each file instead of the md5 hash?

const hash =  (new Uint8Array(buffer))[0];
pavel-vhive commented 10 months ago

Hi Dani,

I agree that each iteration will allocate memory for <file.arrayBuffer()> multiplying number of workers multiplying number of tabs, but this is the maximum memory might be allocated for each iteration and in nominal case: 10 tabs x ~10workers x size of memory of file.arrayBuffer() and this is not so big for my opinion. Next iteration the old memory should be freed by garbage collector, please correct me if I am wrong

Daninet commented 10 months ago

The garbage collection in JavaScript is not predictable. Some engines might not free up that Arraybuffer given that it's within a for loop without separate function contexts. hash-wasm doesn't do a lot of memory allocations though, it should allocate less than 1MB of RAM in this case. I suggest using File.slice() or File.stream() to not store large memory buffers unnecessary in the RAM. There is an example here: https://stackoverflow.com/questions/768268/how-to-calculate-md5-hash-of-a-file-using-javascript/63287199#63287199

pavel-vhive commented 10 months ago

Hi Dani, do you think I should to slice files in order to calculate MD5 if each of them is about 10MB?

Daninet commented 10 months ago

Slicing might help with garbage collection. You can also try to extract the content of the for loop into a separate function.

pavel-vhive commented 10 months ago

Thanks a lot, I will try and update you!

pavel-vhive commented 10 months ago

Hi Dani,

import { createMD5 } from 'hash-wasm';
import { IHasher } from "hash-wasm/dist/lib/WASMInterface";

onmessage = async (event) => {
  const files = event.data;
  console.log(`worker received ${files.length} files`);
  for (let i = 0; i < files.length; i++) {
    const file = files[i];
    const start = Date.now();
    const hash = await hashFile(file);
    const end = Date.now();
    const duration = end - start;
    console.log(`took ${duration} ms`)
    postMessage([hash, file]);
  }
};

const chunkSize = 2 * 1024 * 1024;
const fileReader = new FileReader();
let hasher: IHasher;

function hashChunk(chunk: Blob) {
  return new Promise<Uint8Array>((resolve, reject) => {
    fileReader.onload = async(e) => {
      let array = new Uint8Array(e.target?.result as ArrayBuffer);
      return resolve(array);
    };
    fileReader.readAsArrayBuffer(chunk);
  });
}

const hashFile = async(file: Blob) => {
  if (hasher) {
    hasher.init();
  } else {
    hasher = await createMD5();
  }

  const chunkNumber = Math.floor(file.size / chunkSize);

  for (let i = 0; i <= chunkNumber; i++) {
    const chunk = file.slice(
      chunkSize * i,
      Math.min(chunkSize * (i + 1), file.size)
    );
    hasher.update(await hashChunk(chunk));
  }

  const hash = hasher.digest();
  return Promise.resolve(hash);
};

This is a code for chunked hashing, can you tell me if createMD5 has a mutex that should prevent creating too many wasm instances? Or this code handles it?

  if (hasher) {
    hasher.init();
  } else {
    hasher = await createMD5();
  }
Daninet commented 10 months ago

createMD5() always creates a new instance and allocates new memory for it. Your if statement from the end looks correct. init() reuses the existing instance.

pavel-vhive commented 10 months ago

Hi Dani,

It seems the issue resolved, thank you very much!