-
Hi Tommy Jones,
I am approaching a topic modelling project based on scientific abstracts and have a question regarding the coherence measure you have thankfully implemented. Since I am not a comput…
-
According to the help,`textstat_collocations` will 'Identify and score collocations from a corpus, character, or tokens object'. First argument description is `x: a character, corpus, or tokens object…
-
Hi, firstly I would like to thank you for open sourcing this kind of project. Lately I have been assigned to a exploratory project and I had no clue n references, and you act as my hero!
When I was…
-
> A feature easy to implement and very useful, whatever the model is (tf idf, Glove, LSA...): phrase collocation!
> Gensim code is quite simple and based on the formula used by Mikolov in its W2V pape…
-
This would mean:
Running `tokens_segment(tokens(a_corpus))` would be roughly the same as `tokens(corpus_segment(a_corpus))`.
-
@kbenoit when I use `collocations` I get a zero row dataframe back. This is the example from the help file. I am using R 3.4.0 on Win machine (`sessionInfo` @ bottom). Not sure of the root of the i…
-
For the current `text2vec_0.5.0`, I tried to run the example in `model_Collocations.R` and got this error:
```r
> data("movie_review")
> preprocessor = function(x) {
+ tolower(x) %>% gsub("[…
-
for size=2, 3 as the same function
to remove bigrams that are parts of trigrams, trigrams that contain just a bigram
-
The current situation is not very logical (naming of context-method to get cooccurrences).
-
If the pattern is `c("a b", "a b c")`, and the tokens are `"a", "b", "c"`, then the compounded version should be just `"a_b_c"`, not `"a_b_c", "a_b"`. Compounding should never increase the number of t…