Closed DavidArenburg closed 5 years ago
Thanks for reporting. Unfortunately I will not fix this - all high level parallel computing will be dropped on Windows in the next release. Please consider to use serial version - it is not much slower than parallel one on Windows.
вс, 3 февр. 2019 г., 18:27 David Arenburg notifications@github.com:
Reproducible example from the docs
library(text2vec) data("movie_review")
set to number of cores on your machineN_WORKERS = 4if(require(doParallel)) registerDoParallel(N_WORKERS)splits = split_into(movie_review$review, N_WORKERS)jobs = lapply(splits, itoken, tolower, word_tokenizer)v = create_vocabulary(jobs)# Warning message:# 'create_vocabulary.list' is deprecated.# Use 'create_vocabulary.itoken_parallel()' instead.# See help("Deprecated")
vectorizer = vocab_vectorizer(v)jobs = lapply(splits, itoken, tolower, word_tokenizer)tcm = create_tcm(jobs, vectorizer, skip_grams_window = 3L, skip_grams_window_context = "symmetric")# Error in UseMethod("create_tcm") : # no applicable method for 'create_tcm' applied to an object of class "list"
It looks like jobs is supposed to be something else rather a list , but I can't seem to find how to create it otherwise.
sessionInfo()# R version 3.5.1 (2018-07-02)# Platform: x86_64-w64-mingw32/x64 (64-bit)# Running under: Windows >= 8 x64 (build 9200)# # Matrix products: default# # locale:# [1] LC_COLLATE=English_Israel.1252 LC_CTYPE=English_Israel.1252 LC_MONETARY=English_Israel.1252 LC_NUMERIC=C # [5] LC_TIME=English_Israel.1252 # # attached base packages:# [1] parallel stats graphics grDevices utils datasets methods base # # other attached packages:# [1] text2vec_0.5.1 doParallel_1.0.14 iterators_1.0.10 foreach_1.4.4 # # loaded via a namespace (and not attached):# [1] Rcpp_1.0.0 lattice_0.20-35 codetools_0.2-15 digest_0.6.18 grid_3.5.1 R6_2.3.0 futile.options_1.0.1# [8] formatR_1.5 RcppParallel_4.4.2 data.table_1.11.8 futile.logger_1.4.3 Matrix_1.2-14 lambda.r_1.2.3 tools_3.5.1 # [15] mlapi_0.1.0 compiler_3.5.1
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dselivanov/text2vec/issues/296, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3XbhPAqDgiPS_nr2qWNYBRHlus0mks5vJvHAgaJpZM4agIdj .
OK, that's fine. I had glove$fit_transform
crushing RStudio, so I though I'll need to parallelise , but eventually setting n_chunks =
to a higher value solved the issue.
Thanks for the package btw. You are doing a great job. Any planning to add word2vec too or you left it to the wordVectors
package?
Glove and word2vec usually give very similar results, so I don't see much value working on it.
вс, 3 февр. 2019 г., 19:24 David Arenburg notifications@github.com:
Closed #296 https://github.com/dselivanov/text2vec/issues/296.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dselivanov/text2vec/issues/296#event-2114355655, or mute the thread https://github.com/notifications/unsubscribe-auth/AE4u3T-tL9ROpqHVcVedKUVRs8ep3b5Eks5vJv8wgaJpZM4agIdj .
I faced with the similar issue in Ubuntu. Do you have any suggestion?
Regards, Reza
@RezaSadeghiWSU please provide reproducible example, otherwise I can't help.
Following code work on my ubuntu machine and text2vec 0.5.1:
library(text2vec, lib.loc = "~/temp/")
data("movie_review")
# set to number of cores on your machine
N_WORKERS = 4
if(require(doParallel)) registerDoParallel(N_WORKERS)
jobs = itoken_parallel(movie_review$review, tolower, word_tokenizer, n_chunks = N_WORKERS, ids = movie_review$id)
v = create_vocabulary(jobs)
vectorizer = vocab_vectorizer(v)
tcm = create_tcm(jobs, vectorizer, skip_grams_window = 3L, skip_grams_window_context = "symmetric")
Reproducible example from the docs
It looks like
jobs
is supposed to be something else rather alist
, but I can't seem to find how to create it otherwise.