eabdullin / Word2Vec.Net

implementation Word2Vec for .Net framework
Other
126 stars 41 forks source link

OutOfMemoryException when declaring large managed arrays #2

Open johnearnshaw opened 8 years ago

johnearnshaw commented 8 years ago

I'm eager to see this project working :) So... I was playing with the dev branch and I'm unable to load the pre-trained GoogleNews-vectors-negative300.bin (300-dimensional vectors for 3 million words and phrases) due to the memory allocation restriction in managed arrays.

Have you considered using System.IO.MemoryMappedFiles.MemoryMappedFile to overcome this rather than declaring large arrays?

eabdullin commented 8 years ago

@johnearnshaw, hi. I tried to load my trained 200-dimensional vectors for 1 million words - was ok. So, i think your computer doesn't have enough memory, because algorithm uses manage arrays and allocate memory for needed size

Array.Resize(ref _vocab, _vocabMaxSize);
tomachristian commented 8 years ago

Well, still doesn't work for GoogleNews-vectors-negative300.bin

GuntaButya commented 7 years ago

2GB is the max Memory Size for an Object, you will need to split the Dataset, or use a smaller one. See: http://stackoverflow.com/questions/982051/net-max-memory-use-2gb-even-for-x64-assemblies

long memory = GC.GetTotalMemory(false); // Load your DataSet... Console.WriteLine(String.Format(CultureInfo.InvariantCulture, "Corpus Memory Use: {0:0.0} G-bytes", ((Convert.ToDouble(totalMem) / 1024.0) / 1024.0) / 1024.0));