Closed oskarflordal closed 8 years ago
Attempting to run this code now on an Amazon cluster where I'd previously gotten the error this is supposed to fix. Not sure how long it'll take but I'll update when it's done.
w2v.loadModel('../GoogleNews-vectors-negative300.bin', function(err, model){
console.log('model',model);
});
Thanks for the commit & the fix of the string length. I am thinking that maybe we can remove the slice
operation when creating a new WordVec
instance. Apparently, node Buffers are allocated in memory outside of the V8 heap, so if we avoid creating a shallow copy and would instead just provide a new view on the underlying data, this might help. So instead of an ordinary array
, we would simply store a typed array
view on the underlying buffer in the values
field of the word vector.
I made some little changes to the code to facilitate this and merged it into the master branch. Would be great if someone could have a look.
Oh excellent. I'll use the current master branch and give it a shot now.
$ node index.js --max_old_space_size 4096 > out.txt
FATAL ERROR: CALL_AND_RETRY_2 Allocation failed - process out of memory
Aborted (core dumped)
Even with the optimization it's still hitting 4GB of memory usage and dumping.
Thanks for trying out the updated code! You were right from the start, and it seems that we might not be able to get this working without a major rewrite of the code, which could utilize either multiple node processes or native C++ code via an add-on. I am a bit at my wit's end, but will let all of you know in case I come up with something in the future.
This might magically get fixed by the upcoming Node "4.0" release, which you can read about here:
https://medium.com/node-js-javascript/4-0-is-the-new-1-0-386597a3436d
(io.js is a fork of node with lots of improvements that is getting folded back into the trunk)
This avoids the crash on gnews.bin but unfortunatley I haven't been able to confirm it works since I run out of memory.