dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
849 stars 135 forks source link

Issue when running for a large corpus of documents #311

Closed Elham-yaz closed 4 years ago

Elham-yaz commented 4 years ago

When I run the code for a large number of documents in a for loop, it only works for the first 1400 documents and then stops givign me the error:

vocabulary has no elements. Empty iterator? Error in intI(j, n = d[2], dn[[2]], give.dn = FALSE) : index larger than maximal 0

I tried generating new iterators as you have in https://github.com/dselivanov/text2vec/issues/65, but still getting the error. Any help would be greatly appreciated! Thanks.

dselivanov commented 4 years ago

Not sure how i can help without reproducible example - please provide one.

65 is not relevant - that behaviour was changed long time ago.

dselivanov commented 4 years ago

Are you sure your vocabulary has words? when you write doc_set_1 = movie_review[i,] you take a single document. And when you do prune_vocabulary(doc_proportion_max = 0.1,term_count_min = 5) you limit vocabulary to the words which occur at least 5 times.

Doesn't look that you example make sense.

Elham-yaz commented 4 years ago

I see... thanks a lot!