bmschmidt / wordVectors

An R package for creating and exploring word2vec and other word embedding models
Other
280 stars 77 forks source link

Fatal Error on 1 MB file #42

Open rychardguedes opened 6 years ago

rychardguedes commented 6 years ago

Congratulations, the package is great and thanks for developing it.

I've teste with some standard dataset and it works great (including the 50 MB cookbooks). However, when using it with a personal 1 MB dataset written in Brazilian Portuguese, R crashes every single time. I've already removed punctuation and excess white space, tried with 1/2/4/8 threads, 100/200/500 vectors and with/without removing stopwords, but got no better result. Do you have any idea what it can be the reason of this crash?

ayuda

rychardguedes commented 6 years ago

More information and tests:

I tried with accent, like "é ó ú â ã", and it worked fine for small files (5 KB). I also tried with small pieces of my personal dataset, it worked with samples from 1% to 6% of it. I also generated a lorem ipsum files: from 5 KB until 700 KB, it worked. However, when I tried with around 1 MB, it crashed again.

mdilmanian commented 5 years ago

I'm having the same issues -- RStudio aborts even when trying to train very reasonably-sized files. The same problem with the rword2vec package as well. Path/file length is unlikely to be an issue (path length of around 80-90 characters).

Appreciate any suggestions!