Closed leungi closed 6 years ago
Hi. Thanks for reporting. Could you provide fully reproducible example with movie_review
dataset? also i need to know your sessionInfo()
.
Thanks for the prompt response Dmitriy.
sessionInfo() R version 3.3.3 (2017-03-06) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252
attached base packages: [1] parallel stats graphics grDevices utils datasets methods base
other attached packages: [1] doParallel_1.0.10 iterators_1.0.8 foreach_1.4.3 dplyr_0.7.2 purrr_0.2.2.2 readr_1.1.1 tidyr_0.6.3 tibble_1.3.4 ggplot2_2.2.1.9000 [10] tidyverse_1.1.1 data.table_1.10.4 text2vec_0.5.0
loaded via a namespace (and not attached): [1] reshape2_1.4.2 splines_3.3.3 haven_1.0.0 lattice_0.20-34 colorspace_1.3-2 stats4_3.3.3 mgcv_1.8-17 rlang_0.1.1 [9] ModelMetrics_1.1.0 nloptr_1.0.4 foreign_0.8-67 glue_1.1.1 readxl_1.0.0 lambda.r_1.1.9 modelr_0.1.1 bindrcpp_0.2 [17] plyr_1.8.4 bindr_0.1 stringr_1.2.0 MatrixModels_0.4-1 cellranger_1.1.0 munsell_0.4.3 gtable_0.2.0 futile.logger_1.4.3 [25] rvest_0.3.2 codetools_0.2-15 psych_1.7.5 forcats_0.2.0 SparseM_1.76 caret_6.0-73 quantreg_5.29 pbkrtest_0.4-7 [33] broom_0.4.2 Rcpp_0.12.12 scales_0.4.1.9002 RcppParallel_4.3.20 jsonlite_1.5 lme4_1.1-12 mnormt_1.5-5 hms_0.3 [41] digest_0.6.12 stringi_1.1.5 grid_3.3.3 tools_3.3.3 magrittr_1.5 lazyeval_0.2.0 futile.options_1.0.0 car_2.1-4 [49] pkgconfig_2.0.1 MASS_7.3-45 Matrix_1.2-8 xml2_1.1.1 lubridate_1.6.0 assertthat_0.2.0 minqa_1.2.4 httr_1.2.1 [57] R6_2.2.2 compiler_3.3.3 nnet_7.3-12 nlme_3.1-131
data("movie_review")
it = itoken_parallel(movie_review$review[1:100], n_chunks = N_WORKERS)
system.time(dtm <- create_dtm(it, hash_vectorizer(2**16), type = 'dgTMatrix'))
user system elapsed
0.04 0.11 2.30
dtm
100 x 65536 sparse Matrix of class "dgTMatrix"
Error in if (msg.if.not.empty && is.list(dn) && length(dn) >= 2 && is.character(cn <- dn[[2]]) && :
missing value where TRUE/FALSE needed
Ivan
From: Dmitriy Selivanov [mailto:notifications@github.com] Sent: Thursday, September 07, 2017 8:22 AM To: dselivanov/text2vec text2vec@noreply.github.com Cc: Leung, Ivan Ivan_Leung@oxy.com; Author author@noreply.github.com Subject: [EXTERNAL] Re: [dselivanov/text2vec] Parallelization Issue (#205)
Hi. Thanks for reporting. Could you provide fully reproducible example with movie_review dataset? also i need to know your sessionInfo().
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/dselivanov/text2vec/issues/205#issuecomment-327797231, or mute the threadhttps://github.com/notifications/unsubscribe-auth/Ac3xTHJSoDnhJcNGRITLWH2ZzBKOZG0mks5sf-3kgaJpZM4PPx_o.
Thanks for reporting. I'm very sorry that it took so long to fix. Issue was not related to parallel processing - there was a minor mistake with character(0)
instead of NULL
for empty column names in dtm
. This caused error during printing, but did not affect anything else.
Hi,
First off, I'm glad to have found this package! Kudos on the focus on speed and ease of functions use.
I tried parallelizing create_dtm and got an output with an accompanying error msg:
Error in if (msg.if.not.empty && is.list(dn) && length(dn) >= 2 && is.character(cn <- dn[[2]]) && : missing value where TRUE/FALSE needed
The dtm is mainly empty, and I initially suspected it has to do with my data. However, I got similar output running the example case:
_> data("movie_review")
Look forward to your insights. Ivan