TommyJones / textmineR

An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.
Other
106 stars 34 forks source link

Parallel execution fails on Windows when TmParallelApply called from inside a function #21

Closed tlutz1 closed 8 years ago

tlutz1 commented 8 years ago

I think it has something to do with the environment that TmParallelApply is looking within. If I have something in my work space with the right name, the error does not happen. Examples below

Fails, because I don't have anything in my work space named stopword_vec

rm(list=ls())
data(nih_sample)
 dtm <- CreateDtm(nih_sample$ABSTRACT_TEXT, 
                  doc_names = nih_sample$APPLICATION_ID, 
                  ngram_window = c(1, 2))

Fails for the same reason

rm(list=ls())
data(nih_sample)
 dtm <- CreateDtm(nih_sample$ABSTRACT_TEXT, 
                  stopword_vec = c("blah")
                  doc_names = nih_sample$APPLICATION_ID, 
                  ngram_window = c(1, 2))

Does not fail, even though this is not the stopword_vec passed to the function

rm(list=ls())
data(nih_sample)
stopword_vec <- "blah"
 dtm <- CreateDtm(nih_sample$ABSTRACT_TEXT, 
                  doc_names = nih_sample$APPLICATION_ID, 
                  ngram_window = c(1, 2))

It looks like the source might be parallel::clusterExport

TommyJones commented 8 years ago

Eek. Ok. Looking into this. On Fri, Apr 29, 2016 at 10:33 AM tlutz1 notifications@github.com wrote:

I think it has something to do with the environment that TmParallelApply is looking within. If I have something in my work space with the right name, the error does not happen. Examples below

  1. Fails, because I don't have anything in my work space named stopword_vec

rm(list=ls()) data(nih_sample) dtm <- CreateDtm(nih_sample$ABSTRACT_TEXT, doc_names = nih_sample$APPLICATION_ID, ngram_window = c(1, 2))

  1. Fails for the same reason

rm(list=ls()) data(nih_sample) dtm <- CreateDtm(nih_sample$ABSTRACT_TEXT, stopword_vec = c("blah") doc_names = nih_sample$APPLICATION_ID, ngram_window = c(1, 2))

  1. Does not fail, even though this is not the stopword_vec passed to the function

rm(list=ls()) data(nih_sample) stopword_vec <- "blah" dtm <- CreateDtm(nih_sample$ABSTRACT_TEXT, doc_names = nih_sample$APPLICATION_ID, ngram_window = c(1, 2))

It looks like the source might be parallel::clusterExport

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/TommyJones/textmineR/issues/21

TommyJones commented 8 years ago

Think I got it. I added an option to declare a default search environment to TmParallelApply. Testing now. Will close the issue if it passes all tests.

TommyJones commented 8 years ago

Tested. Please open a new issue if it crops up again elsewhere.