Closed aleksey-hoffman closed 4 years ago
As far as I see there is a memory leak at the setInterval() call: it prevents GC from cleaning up the variables from the context. If I comment that out or if I add this line clearInterval(this.interval);
to the .on('end')
handler, I don't see those out of memory errors anymore.
In my experience, the average use case doesn't need web workers or worker threads. Usually the bottleneck is the disk I/O: the average consumer SSD cannot keep up with the rate of calculating hashes.
On their own, most algorithms from hash-wasm (including xxhash) shouldn't use more RAM than 1-2 MB / instance. Most browsers allow up to 2 / 4 GB of RAM allocated to WASM processes. So it shouldn't be a matter of concern if the code doesn't have any memory leaks.
@Daninet thank you for figuring out what was causing the problem.
By the way, are you planning to add XXH3
? It seems like it's 2x times faster than XXH64
, according to the benchmark here:
https://github.com/Cyan4973/xxHash
I'm planning to add new algorithms, but I don't think that XXH3 would be significantly faster than XXH64 when running it in WASM. The performance of that algorithm comes from SSE2, which I cannot use from WASM, until the SIMD instructions aren't supported by the browsers.
@Daninet ahh, I see, you're compiling WASM directly into JS so it can be used in browsers as well. I'm personally using it in an Electron project.
Have you ever considered compiling your WASM module into C++ so it can be used as a native module in Node.js and Electron projects via N-API? Would that allow you to use SSE instructions?
The main goal of this library is to provide a portable but fast solution for calculating hashes. The main target is the browser.
There are other of libraries on NPM, which use N-API, NAN or other ways to generate native bindings for Node.js / Electron. N-API allows using all types of custom instructions, but it is not portable: the source needs to be recompiled for each platform / CPU architecture where you want to run it.
Side note in case it wasn't known: support for SIMD in wasm V8 (chrome/node) can be enabled with --experimental-wasm-simd
. Since this project is using emscripten: https://emscripten.org/docs/porting/simd.html
Yeah, I know. The SIMD instructions are in origin trial currently. This means that probably in a few months Chrome will enable it by default. https://www.chromestatus.com/feature/6533147810332672 The main question is about Safari, which usually lags behind in terms of supporting new features.
I also made some measurements and with the current V8 version, I see about 10% improvement when using SIMD. Probably that will improve as the JIT gets better at optimizing SIMD instructions, but we will have to wait for that. I was thinking about compiling two versions from each library, but I think it's not worth to double the bundle size for that 10% improvement, which cannot be used anywhere without enabling flags. I also started writing a library to backport the SIMD instructions in the browser on runtime, where they aren't supported: https://github.com/Daninet/wasm-backporter It works, but it makes the performance worse, where the SIMD instructions are not supported.
I'm keeping an eye on the advancements regarding on SIMD. I will enable it as soon as it delivers significantly better performance and it has a decent browser support.
Good day @Daninet. My project hashes multiple files in parallel using
hash-wasm
. There's about 20 components and each one of them creates their own newhash-wasm
instance and hashes a specified file. The problem is, when it hashes too many files (usually ~20) at the same time, I get the following error:The hashing is triggered by a JS
scroll
event (component hashes specified file when it enters the viewport). It takes about 20-30 parallel calls (depending on the files sizes) to fill up the WASM memory and cause the error. Sometimes when I stop the hashing (stop scrolling the page) and then resume it after a few seconds it throws the error immediately as if the memory wasn't cleaned up at all and a single call is all it takes to cause the error. What I don't understand is why all the instances are interconnected and affect each other.And the weird things is, even when it throws the error, it still finishes hashing the file properly.
Code
fileHasher.js
testFileHasher.js
The project is based on Electron (Node.js+ Chromium) so initially I thought it was a Chromium bug, but then I ran it on Node.js in terminal, and got the same problem.
In this example I'm emulating multiple parallel jobs by hashing a single 20 MB file every 10 ms (in the real app every component creates a new
hash-wasm
instance and hashes different file of different size). It usually takes ~80 ms to hash this particular image once, but since it's called every 10 ms it doesn't have time to finish the job. As you can see in theconsole output
below it exceeds the memory limits at about 20 parallel calls (the same thing happens in the real app). Notice how the hashing time grows and how all the instances are affecting each other.Console output
Questions
Surprisingly this many parallel calls do not block the main JS thread even though it's very computationally intensive, but as soon as I get this WASM error, it starts blocking the main thread and UI starts freezing.
I tried creating a separate web worker for each computation but it takes up a lot of RAM and crashes the app at 20+ parallel computations.
hash-wasm
to wait until all the other instances are done so it doesn't exceed the WASM memory limits?hash-wasm
instance so they don't share the same memory limit? Would it overwhelm and block the main JS thread if it were to run on multiple WASM instances?Environment: OS: Win10 x64 hash-wasm: v4.1.0 Exec env: the same results everywhere: