Open jooyoungseo opened 5 years ago
I was experiencing the same issue today when setting k=0 as described in the vignette on page 12-13 for finding a rough estimate of the number of topics.
Would be really cool if you could look into that and maybe fix it!
Best, Flo
me too, but I remember that it was not an issue in the past...
I'm having this issue, too! Would love it if it can be resolved.
same problem here.
Me too.
The problem is that we all use a quanteda dfm as input. This conflicts with how the default value of the function parameter N
is set: this simply takes the length of the documents
parameter. This works if you use the STM internal documents and vocabulary approach but for a quanteda::dfm()
this does not work.
In my case I had a fixed number of documents so I simply guestimated from example code what the approximate size of N would have been and entered that number manually into searchK()
.
can you briefly explain how you did the estimation? how did you estimate the approximate size of N?
I think I looked up in the code how the parameter is set by default and verified the outcome with an example data set in stm format. That gave me sufficient clues to extrapolate to my own use case.
HTH!
On Sun, 2 May 2021, 12:12 tilloverlack, @.***> wrote:
can you briefly explain how you did the estimation? how did you estimate the approximate size of N?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bstewart/stm/issues/198#issuecomment-830784039, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABGUIFXWDIHWYYGLL3NBKADTLUQQDANCNFSM4HNVF6WA .
Hey guys, I followed @paullemmens 's instruction (thanks a lot) and set my searchK's parameter N to "floor(0.1 nrow(meta))" and it works. The default N is floor(0.1 length(documents)), and the "documents" variable means the number of documents you have. In our cases, this number is the row of our "meta", or you can simply set this number to your number of documents. Good luck!
Please check wehtehr users can employ quanteda's dfm object for
searchK()
function. Apparently, there seems an issue:Created on 2019-05-17 by the reprex package (v0.3.0.9000)
Session info
``` r devtools::session_info() #> - Session info ---------------------------------------------------------- #> setting value #> version R version 3.6.0 (2019-04-26) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/New_York #> date 2019-05-17 #> #> - Packages -------------------------------------------------------------- #> ! package * version date lib #> assertthat 0.2.1 2019-03-21 [1] #> backports 1.1.4 2019-04-10 [1] #> callr 3.2.0 2019-03-15 [1] #> cli 1.1.0 2019-03-19 [1] #> colorspace 1.4-1 2019-03-18 [1] #> crayon 1.3.4 2017-09-16 [1] #> data.table 1.12.3 2019-05-15 [1] #> desc 1.2.0 2018-05-01 [1] #> devtools 2.0.2.9000 2019-05-13 [1] #> digest 0.6.18 2018-10-10 [1] #> dplyr 0.8.0.9014 2019-05-06 [1] #> evaluate 0.13 2019-02-12 [1] #> fastmatch 1.1-0 2017-01-28 [1] #> fs 1.3.1 2019-05-06 [1] #> ggplot2 3.1.1.9000 2019-05-17 [1] #> glue 1.3.1.9000 2019-05-01 [1] #> gtable 0.3.0 2019-03-25 [1] #> highr 0.8 2019-03-20 [1] #> htmltools 0.3.6 2017-04-28 [1] #> ISOcodes 2019.04.22 2019-04-23 [1] #> knitr 1.22.12 2019-05-17 [1] #> lattice 0.20-38 2018-11-04 [1] #> lazyeval 0.2.2 2019-03-15 [1] #> lubridate 1.7.4.9000 2019-05-01 [1] #> magrittr 1.5 2014-11-22 [1] #> Matrix 1.2-17 2019-03-22 [1] #> memoise 1.1.0 2017-04-21 [1] #> munsell 0.5.0 2018-06-12 [1] #> pillar 1.4.0 2019-05-11 [1] #> pkgbuild 1.0.3 2019-03-20 [1] #> pkgconfig 2.0.2 2018-08-16 [1] #> pkgload 1.0.2 2018-10-29 [1] #> prettyunits 1.0.2 2015-07-13 [1] #> processx 3.3.1 2019-05-08 [1] #> ps 1.3.0 2018-12-21 [1] #> purrr 0.3.2.9000 2019-04-27 [1] #> quanteda * 1.4.5 2019-05-13 [1] #> R6 2.4.0 2019-02-14 [1] #> Rcpp 1.0.1.3 2019-04-27 [1] #> D RcppParallel 4.4.2 2018-12-11 [1] #> remotes 2.0.4.9000 2019-05-13 [1] #> rlang 0.3.4.9003 2019-05-01 [1] #> rmarkdown 1.12.8 2019-05-15 [1] #> rprojroot 1.3-2 2018-01-03 [1] #> scales 1.0.0 2018-08-09 [1] #> sessioninfo 1.1.1 2018-11-05 [1] #> SnowballC 0.6.0 2019-01-15 [1] #> spacyr 1.1 2019-05-13 [1] #> stm * 1.3.3 2019-04-27 [1] #> stopwords 0.9.0 2017-12-14 [1] #> stringi 1.4.3 2019-03-12 [1] #> stringr 1.4.0.9000 2019-05-15 [1] #> testthat 2.1.1 2019-04-23 [1] #> tibble 2.1.1 2019-03-16 [1] #> tidyselect 0.2.5.9000 2019-04-27 [1] #> usethis 1.5.0.9000 2019-05-13 [1] #> vctrs 0.1.0.9003 2019-05-17 [1] #> withr 2.1.2 2018-03-15 [1] #> xfun 0.7 2019-05-14 [1] #> yaml 2.2.0 2018-07-25 [1] #> zeallot 0.1.0 2018-01-28 [1] #> source #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (Rdatatable/data.table@93f50f7) #> CRAN (R 3.6.0) #> Github (r-lib/devtools@92d32cb) #> CRAN (R 3.6.0) #> Github (hadley/dplyr@9c6f59e) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (hadley/ggplot2@1f6f0cb) #> Github (tidyverse/glue@ea0edcb) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (yihui/knitr@f85bce0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (hadley/lubridate@99e2af3) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (hadley/purrr@25d84f7) #> Github (quanteda/quanteda@29d9fd3) #> CRAN (R 3.6.0) #> Github (RcppCore/Rcpp@6062d56) #> CRAN (R 3.6.0) #> Github (r-lib/remotes@ba2f034) #> Github (r-lib/rlang@6a232c0) #> Github (rstudio/rmarkdown@62ab411) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (quanteda/spacyr@4d1373d) #> Github (bstewart/stm@525b00c) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (hadley/stringr@0b90f91) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> Github (tidyverse/tidyselect@19150c0) #> Github (r-lib/usethis@dced164) #> Github (r-lib/vctrs@cd0e31e) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> CRAN (R 3.6.0) #> #> [1] C:/Program Files/R/R-3.6.0/library #> #> D -- DLL MD5 mismatch, broken installation. ```