TommyJones / textmineR

An aid for text mining in R, with a syntax that should be familiar to experienced R users. Provides a wrapper for several topic models that take similarly-formatted input and give similarly-formatted output. Has additional functionality for analyzing and diagnostics for topic models.
Other
106 stars 34 forks source link

CalcProbCoherence algorithm unclear #75

Closed jeason15 closed 5 years ago

jeason15 commented 5 years ago

Hi! Is there any way that you can point me to documentation regarding the algorithm used to calculate the coherence scores for the topics generated? I am unclear as to which method is being used to obtain this (UCI, UMass, etc). Thanks!

TommyJones commented 5 years ago

Hi @jeason15! Thanks for your interest in textmineR. There's a pretty extensive discussion on this in a closed issue, #35. I've also described it in one of the vignettes https://github.com/TommyJones/textmineR/blob/master/vignettes/c_topic_modeling.Rmd

The TL;DR is that this is a metric of my own making. But it is very similar to the "difference measure" described on page 2 here: https://pdfs.semanticscholar.org/03a0/62fdcd13c9287a2d4e1d6d057fd2e083281c.pdf

Hope that helps! Closing now. But feel free to ask follow up questions in the thread if need be.