Closed Almars12345 closed 7 years ago
You'll need to increase the limit to about 40g of RAM to run that. Are you saying that you are getting this error with unmodified example code?
@saudent Yes getting this error with the above code.Are sure i need 40g? My laptop RAM is 8 G.I think 40 g too much .
@saudet
What about Word2VecSentimentRNN? Do you have any problems executing that one?
I'm not sure why 40gb would be needed there. It's pure oom during w2v google model load, which uses around 4gb itself for floats only + strings.
@saudet I executed the same code without loading the model, it works.However , loading the model gives me the error .Regarding to word2vecsentiment , it works .I believe the problem with loading the model.it consumes so much memory. I increased the heap size -Xmx6G , but still .Also, i run the same program with 16 RAM, the same issue appears.
Here's where google model loading happens: https://github.com/deeplearning4j/deeplearning4j/blob/6c11cd24ed47c37d535ec38ca7a7afbcb1b50891/deeplearning4j-nlp-parent/deeplearning4j-nlp/src/main/java/org/deeplearning4j/models/embeddings/loader/WordVectorSerializer.java#L224-L224
TL/DR: syn0 is created, which matches model dimensionality, in case of Google Model - it's 3m x 300. After it's created - vectors are read one by one, and inserted into syn0.
I don't see any real "extra" memory use here.
Btw, as fast workaround you could use w2v models with smaller dimensionality.
@raver119 so , what i have to do to fix it? And how can I decrease the dimensionality ?
I'm not sure at this moment, if there's anything to be fixed. I've pointed you to the code, so you can see it yourself.
As for dimensionality: just download another pre-trained model, or train your own. There's plenty of other models available for download in the web. Google Model isn't the only :)
I.e. here: https://github.com/3Top/word2vec-api
@raver119 i'm wondering if i can load GloVe model such as crawl, twitter..etc., using deeplearing4j library ? is there any sample i can have a look at thank you
Yes, just use the same methods. There's 2 widely used formats out there, and we support them both. One is binary model (like google model) and other one is csv. So you'll definitely be fine using WordVectorSerializer utility methods
@raver119 hi again , is it possible to run the word2vect code in java project , i mean not maven? and what are the libraries needed in order to do that.
We don't really support setups without build systems being used. Sure, you're free to go that way, but you'll be on your own there. I.e. parse maven dependency tree, or something like that.
Even i am faced the same issue like "Unable to allocate memory" for the Google-news model, then tried with CBOW model (words.cbow.s200.w2v.bin.gz). It reads the 1st word in words.cbow.s200.w2v.bin file and exit with error message like unable to read the file format.
I have RAM: 8GB, IDE: IntelliJ, xmx value: 1024M.
Can you please tell me, how to solve this error.?
Yeah you need a larger machine and larger heap space. You need a bigger machine to run the google model. There's nothing for us to do here.
When i loade google model, i get "out of memory"issue . I try modifying the heap size from run configuration ,but it still not working.BTW,I'm using NetBeans to run the code below.
Here is the error
Word2Vec vec1 = WordVectorSerializer.loadGoogleModel(gModel, true);
Finally POM
Could you please help me solve this problem .