Open scherbakovdmitri opened 2 years ago
Maybe I don't get how it works, but I am following the example in the package text2vec:
library(text2vec) data("movie_review") N = 500 tokens = word_tokenizer(tolower(movie_review$review[1:N])) it = itoken(tokens, ids = movie_review$id[1:N]) v = create_vocabulary(it) v = prune_vocabulary(v, term_count_min = 5, doc_proportion_max = 0.2) dtm = create_dtm(it, vocab_vectorizer(v)) lda_model = LDA$new(n_topics = 10) doc_topic_distr = lda_model$fit_transform(dtm, n_iter = 20) # run LDAvis visualisation if needed (make sure LDAvis package installed) lda_model$plot()
Notice how for the token "end" the bars are different (one crosses the tick , and the other - does not)
This becomes more obvious if you have few tokens in corpus, then the width changes considerably. Any explanation to this? Thanks!
Maybe I don't get how it works, but I am following the example in the package text2vec:
Notice how for the token "end" the bars are different (one crosses the tick , and the other - does not)
This becomes more obvious if you have few tokens in corpus, then the width changes considerably. Any explanation to this? Thanks!