dselivanov / text2vec

Fast vectorization, topic modeling, distances and GloVe word embeddings in R.
http://text2vec.org
Other
851 stars 136 forks source link

Question: More clarifications on some of the code segments #270

Closed sweetmals closed 6 years ago

sweetmals commented 6 years ago

Hi @manuelbickel

Would be great if you could clarify some of the code lines of coherence.R implementation.

tcm = as.matrix(tcm[top_terms_tcm, top_terms_tcm]) By this time you already have a filtered TCM corresponding to the top terms in the input x and original TCM itself. Correct me if I am wrong.

I am not clear what happens from the following lines within each topic. topic_i_term_indices = match(x[, i], terms_tcm)

remove NA indices - not all top terms for topic 'i' are necessarily included in tcm

topic_i_term_indices = topic_i_term_indices[!is.na(topic_i_term_indices)]

Isn’t it the same you do with taking the intersect of top_terms_unique and terms_tcm, then re-constructing the TCM by this line ‘tcm = as.matrix(tcm[top_terms_tcm, top_terms_tcm])’?

Also I am finding bit hard to understand the computation for 'log(smooth + tcm[x,y]) - log(tcm[y,y]'. Would you be able to explain a bit about taking the transpose and then dividing by the diagonal and then taking log of lower triangular. It seems to me that following 5 lines basically solve the above log equation, but I lack the understanding of how it does. d = diag(res) res = t(res) res = res / d res = res[lower.tri(res)] res = log(res)

Please bear with me for my lack of knowledge.

Thanks a lot.

dselivanov commented 6 years ago

Hi @sweetmals . Issue tracker is not a place for general questions. If you think something is a bug, please provide example and arguments why do you think so.

If you still think there is a bug / issue with coherence measurement - please clearly formulate what is wrong and how you think it should be. You can see actual code in R/coherence.R

Also please take a look on basic text formatting on github - https://help.github.com/articles/basic-writing-and-formatting-syntax/

manuelbickel commented 6 years ago

Maybe the question might be asked on stackoverflow, I will answer it there...

Am 19. Juni 2018 10:39:28 MESZ schrieb Dmitriy Selivanov notifications@github.com:

Closed #270.

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/dselivanov/text2vec/issues/270#event-1688029919

-- sent via mobile - please excuse typos

sweetmals commented 6 years ago

Hi @manuelbickel That's alright, I've been asking you lot of questions via the other thread too. Thanks a lot for your help.