dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
852 stars 136 forks source link

"caught illegal operation" from create_vocabulary() #302

Closed genec1 closed 5 years ago

genec1 commented 5 years ago

I'm getting a fatal crash when running create_vocabulary(). This crash occurs when I run the test code on the Vectorization page with movie_review

> vocab <- create_vocabulary(train_it)
|                                              
|================                  
|  10%
 *** caught illegal operation ***
address 0x1038ae234, cause 'illegal opcode'

Traceback:
 1: cpp_vocabulary_insert_document_batch(ptr, x)
 2: vocabulary_insert_document_batch_generic(vocab_ptr, tokens$tokens)
 3: eval(xpr, envir = envir)
 4: eval(xpr, envir = envir)
 5: doTryCatch(return(expr), name, parentenv, handler)
 6: tryCatchOne(expr, names, parentenv, handlers[[1L]])
 7: tryCatchList(expr, classes, parentenv, handlers)
 8: tryCatch(eval(xpr, envir = envir), error = function(e) e)
 9: doTryCatch(return(expr), name, parentenv, handler)
10: tryCatchOne(expr, names, parentenv, handlers[[1L]])
11: tryCatchList(expr, classes, parentenv, handlers)
12: tryCatch({    repeat {        args <- nextElem(it)        if (obj$verbose) {            cat(sprintf("evaluation # %d:\n", i))            print(args)        }        for (a in names(args)) assign(a, args[[a]], pos = envir,             inherits = FALSE)        r <- tryCatch(eval(xpr, envir = envir), error = function(e) e)        if (obj$verbose) {            cat("result of evaluating expression:\n")            print(r)        }        tryCatch(accumulator(list(r), i), error = function(e) {            cat("error calling combine function:\n")            print(e)            NULL        })        i <- i + 1    }}, error = function(e) {    if (!identical(conditionMessage(e), "StopIteration"))         stop(simpleError(conditionMessage(e), expr))})
13: e$fun(obj, substitute(ex), parent.frame(), e$data)
14: foreach(tokens = it) %do% {    vocabulary_insert_document_batch_generic(vocab_ptr, tokens$tokens)}
15: create_vocabulary.itoken(train_it)
16: create_vocabulary(train_it)

This is my first time using the text2vec package, so it is a fresh install. Details:

> R.version
               _                           
platform       x86_64-apple-darwin15.6.0   
arch           x86_64                      
os             darwin15.6.0                
system         x86_64, darwin15.6.0        
status                                     
major          3                           
minor          5.2                         
year           2018                        
month          12                          
day            20                          
svn rev        75870                       
language       R                           
version.string R version 3.5.2 (2018-12-20)
nickname       Eggshell Igloo              

> sessionInfo()
R version 3.5.2 (2018-12-20)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Mojave 10.14.5

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] text2vec_0.5.1.5  magrittr_1.5      broom_0.5.2       forcats_0.4.0     stringr_1.4.0     dplyr_0.8.0.1     purrr_0.3.2       readr_1.3.1       tidyr_0.8.3      
[10] tibble_2.1.1      ggplot2_3.1.1     tidyverse_1.2.1   data.table_1.12.2

loaded via a namespace (and not attached):
 [1] mlapi_0.1.0          tidyselect_0.2.5     xfun_0.6             haven_2.1.0          lattice_0.20-38      colorspace_1.4-1     generics_0.0.2       htmltools_0.3.6     
 [9] yaml_2.2.0           base64enc_0.1-3      rlang_0.3.4          pillar_1.3.1         glue_1.3.1           withr_2.1.2          lambda.r_1.2.3       modelr_0.1.4        
[17] readxl_1.3.1         foreach_1.4.4        plyr_1.8.4           futile.logger_1.4.3  munsell_0.5.0        gtable_0.3.0         cellranger_1.1.0     rvest_0.3.3         
[25] codetools_0.2-16     evaluate_0.13        knitr_1.22           Rcpp_1.0.1           formatR_1.6          scales_1.0.0         backports_1.1.4      RcppParallel_4.4.3  
[33] jsonlite_1.6         hms_0.4.2            digest_0.6.19        stringi_1.4.3        grid_3.5.2           cli_1.1.0            tools_3.5.2          lazyeval_0.2.2      
[41] futile.options_1.0.1 crayon_1.3.4         pkgconfig_2.0.2      Matrix_1.2-17        xml2_1.2.0           lubridate_1.7.4      iterators_1.0.10     assertthat_0.2.1    
[49] rmarkdown_1.12       httr_1.4.0           rstudioapi_0.10      R6_2.4.0             nlme_3.1-139         compiler_3.5.2      
dselivanov commented 5 years ago

Thanks for report. Looks strange - I use pretty the same environment on my laptop. I've noticed you use dev version 0.5.1.5. Is it from master branch or 0.6?

genec1 commented 5 years ago

I just ran devtools::install_github('dselivanov/text2vec') and that's what I got.

genec1 commented 5 years ago

I reinstalled from CRAN and the crash went away. Easy fix!

dselivanov commented 5 years ago

Could you please install from 0.6 branch and see whether error appears?

пт, 24 мая 2019 г., 21:54 genec1 notifications@github.com:

I reinstalled from CRAN and the crash went away. Easy fix!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/dselivanov/text2vec/issues/302?email_source=notifications&email_token=ABHC5XI5Q7E2HRUAGLFRXFDPXATUVA5CNFSM4HPLFL52YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWGEDJA#issuecomment-495731108, or mute the thread https://github.com/notifications/unsubscribe-auth/ABHC5XKK6GJCKUVFYZQ7JNDPXATUVANCNFSM4HPLFL5Q .