dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
849 stars 135 forks source link

GloVe example not working #323

Closed brshallo closed 4 years ago

brshallo commented 4 years ago

I'm trying to work through the GloVe example but getting an error here:

glove = GlobalVectors$new(word_vectors_size = 50, x_max = 10)
#> Error in .subset2(public_bind_env, "initialize")(...): unused argument (word_vectors_size = 50)

devtools::session_info()
#> - Session info ---------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.6.3 (2020-02-29)
#>  os       Windows 10 x64              
#>  system   x86_64, mingw32             
#>  ui       RTerm                       
#>  language (EN)                        
#>  collate  English_United States.1252  
#>  ctype    English_United States.1252  
#>  tz       America/New_York            
#>  date     2020-04-06                  
#> 
#> - Packages -------------------------------------------------------------------
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 3.6.3)
#>  backports     1.1.6   2020-04-05 [1] CRAN (R 3.6.3)
#>  callr         3.4.3   2020-03-28 [1] CRAN (R 3.6.3)
#>  cli           2.0.2   2020-02-28 [1] CRAN (R 3.6.3)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 3.6.3)
#>  data.table    1.12.8  2019-12-09 [1] CRAN (R 3.6.3)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 3.6.3)
#>  devtools      2.2.2   2020-02-17 [1] CRAN (R 3.6.3)
#>  digest        0.6.25  2020-02-23 [1] CRAN (R 3.6.3)
#>  ellipsis      0.3.0   2019-09-20 [1] CRAN (R 3.6.3)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 3.6.3)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 3.6.3)
#>  float         0.2-3   2019-05-31 [1] CRAN (R 3.6.0)
#>  fs            1.4.1   2020-04-04 [1] CRAN (R 3.6.3)
#>  glue          1.4.0   2020-04-03 [1] CRAN (R 3.6.3)
#>  highr         0.8     2019-03-20 [1] CRAN (R 3.6.3)
#>  htmltools     0.4.0   2019-10-04 [1] CRAN (R 3.6.3)
#>  knitr         1.28    2020-02-06 [1] CRAN (R 3.6.3)
#>  lattice       0.20-38 2018-11-04 [2] CRAN (R 3.6.3)
#>  lgr           0.3.4   2020-03-20 [1] CRAN (R 3.6.3)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 3.6.3)
#>  Matrix        1.2-18  2019-11-27 [2] CRAN (R 3.6.3)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 3.6.3)
#>  mlapi         0.1.0   2017-12-17 [1] CRAN (R 3.6.3)
#>  pkgbuild      1.0.6   2019-10-09 [1] CRAN (R 3.6.3)
#>  pkgload       1.0.2   2018-10-29 [1] CRAN (R 3.6.3)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 3.6.3)
#>  processx      3.4.2   2020-02-09 [1] CRAN (R 3.6.3)
#>  ps            1.3.2   2020-02-13 [1] CRAN (R 3.6.3)
#>  R6            2.4.1   2019-11-12 [1] CRAN (R 3.6.3)
#>  Rcpp          1.0.4   2020-03-17 [1] CRAN (R 3.6.3)
#>  remotes       2.1.1   2020-02-15 [1] CRAN (R 3.6.3)
#>  renv          0.9.3   2020-02-10 [1] CRAN (R 3.6.3)
#>  RhpcBLASctl   0.20-17 2020-01-17 [1] CRAN (R 3.6.2)
#>  rlang         0.4.5   2020-03-01 [1] CRAN (R 3.6.3)
#>  rmarkdown     2.1     2020-01-20 [1] CRAN (R 3.6.3)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 3.6.3)
#>  rsparse       0.4.0   2020-04-01 [1] CRAN (R 3.6.3)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 3.6.3)
#>  stringi       1.4.6   2020-02-17 [1] CRAN (R 3.6.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 3.6.3)
#>  testthat      2.3.2   2020-03-02 [1] CRAN (R 3.6.3)
#>  text2vec    * 0.6     2020-02-18 [1] CRAN (R 3.6.3)
#>  usethis       1.5.1   2019-07-04 [1] CRAN (R 3.6.3)
#>  withr         2.1.2   2018-03-15 [1] CRAN (R 3.6.3)
#>  xfun          0.12    2020-01-13 [1] CRAN (R 3.6.3)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 3.6.2)
dselivanov commented 4 years ago

Thanks for reporting - i will try to update vignette. I the meantime see example at ?rsparse::GloVe

nikhil-bery commented 4 years ago

Hi facing similar issues.

> library(text2vec)
> 
> # read in training corpus data
> jobs = read.csv("jobposts/data job posts.csv")
> 
> # pre-processing
> tokens = word_tokenizer(tolower(jobs$Title))
> v = create_vocabulary(itoken(tokens))
> v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.5)
> it = itoken(tokens)
> vectorizer = vocab_vectorizer(v)
> 
> # creat a document term matrix and term co-occurance matrix
> dtm = create_dtm(it, vectorizer)
> tcm = create_tcm(it, vectorizer, skip_grams_window = 5)
> 
> # create a model & train out word vectors
> glove = GlobalVectors$new(word_vectors_size = 50, vocabulary = v, x_max = 10)
Error in .subset2(public_bind_env, "initialize")(...) : 
  unused arguments (word_vectors_size = 50, vocabulary = v)
> wv = glove$fit_transform(tcm, n_iter = 10)
Error: object 'glove' not found
> # get average of main and context vectors as proposed in GloVe paper
> wv = wv + t(glove$components)
Error: object 'wv' not found
> # create a relaxed word movers distance model (based on cosign, rather than euclidian distance)
> rwmd_model = RWMD$new(wv)
Error in stopifnot(is.matrix(embeddings)) : 
  argument "embeddings" is missing, with no default

Has this been resolved yet?

dselivanov commented 4 years ago

@brshallo @nikhil-bery fixed now - see article here http://text2vec.org/glove.html