dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
850 stars 135 forks source link

error message when using itoken_parallel() to fit Collocations model #293

Closed woodysung closed 4 years ago

woodysung commented 5 years ago

When using the itoken_parallel() to fit Collocations model, below error message is found: [Error in { : task 1 failed - "external pointer is not valid"]

Sys.setlocale("LC_ALL", "English")
library(text2vec)
library(stopwords)

library(doParallel)
cl <- makePSOCKcluster(4)
registerDoParallel(cl)

model = Collocations$new(collocation_count_min = 50)
txt = readLines("_text8/text8")
it = itoken_parallel(txt)
model$fit(it, n_iter = 3)
model$collocation_stat

stopCluster(cl)
sessionInfo()

ERROR_.zip

OUTPUT ::

> model = Collocations$new(collocation_count_min = 50)
> txt = readLines("_text8/text8")
> it = itoken_parallel(txt)
> model$fit(it, n_iter = 3)
INFO [2018-12-25 22:11:24] iteration 1 - found 5300 collocations
Error in { : task 1 failed - "external pointer is not valid"
> # stop the cluster
> stopCluster(cl)
> 
> 
> # print the session info
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] doParallel_1.0.14 iterators_1.0.10  foreach_1.4.4     stopwords_0.9.0   text2vec_0.5.1   

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.19         mlapi_0.1.0          knitr_1.20           lattice_0.20-35     
 [5] R6_2.3.0             tools_3.5.1          grid_3.5.1           data.table_1.11.8   
 [9] lambda.r_1.2.3       futile.logger_1.4.3  htmltools_0.3.6      yaml_2.2.0          
[13] RcppParallel_4.4.1   digest_0.6.18        rprojroot_1.3-2      Matrix_1.2-14       
[17] formatR_1.5          futile.options_1.0.1 codetools_0.2-15     rsconnect_0.8.8     
[21] evaluate_0.12        rmarkdown_1.10       compiler_3.5.1       backports_1.1.2     
dselivanov commented 5 years ago

thanks for reporting - will investigate.

dselivanov commented 4 years ago

Windows support for itoken_parallel was dropped - please use single-process version. Hence this issue is not relevant anymore.