bnosac / doc2vec

Distributed Representations of Sentences and Documents
Other
46 stars 5 forks source link

Example plot #24

Open Cdk29 opened 1 year ago

Cdk29 commented 1 year ago

Hi.

I did not managed to reproduce the example graph in the readme :

https://github.com/bnosac/doc2vec/blob/master/tools/example-viz.png

the one that preceed :

library(doc2vec)
library(word2vec)
library(uwot)
library(dbscan)
data(be_parliament_2020, package = "doc2vec")
x      <- data.frame(doc_id = be_parliament_2020$doc_id,
                     text   = be_parliament_2020$text_nl,
                     stringsAsFactors = FALSE)
x$text <- txt_clean_word2vec(x$text)
x      <- subset(x, txt_count_words(text) < 1000)

d2v    <- paragraph2vec(x, type = "PV-DBOW", dim = 50, 
                        lr = 0.05, iter = 10,
                        window = 15, hs = TRUE, negative = 0,
                        sample = 0.00001, min_count = 5, 
                        threads = 1)
model  <- top2vec(d2v, 
                  control.dbscan = list(minPts = 50), 
                  control.umap = list(n_neighbors = 15L, n_components = 3), umap = tumap, 
                  trace = TRUE)
info   <- summary(model, top_n = 7)
info$topwords

I tried several function from textplot, without succes.

Thanks for any hints.

jwijffels commented 1 year ago

Can't find that code back myself If I can recall what I did, I took the top words of the topics and took the embeddings of these (mapped to 2 dimensions) and next plotted these showing only relationships between the terms which were part of the topic. Probably mapping with umap to 2 dimensions will get you there already.