Closed lemuria-wchen closed 3 years ago
Thank you for looking into this, @bab2min!
@FDU-SDS: In the meantime, here is a small snippet for getting the coherence scores from a tomotopy
model via gensim
, if it helps:
import collections
import gensim
def get_coherence(
model, coherence=None, topn=None, window_size=None, processes=None
):
"""
Calculates the coherence score for a given Tomotopy model via Gensim's
`coherencemodel` pipeline.
Parameters
----------
model: Tomotopy.LDAModel
The Tomotopy model to get coherence scores for.
coherence: str, optional
topn: int, optional
window_size: int, optional
processes: int, optional
All of these parameters are passed directly to
`gensim.models.coherencemodel.CoherenceModel`, and the Gensim defaults will
apply if they are omitted.
Returns
-------
float
The coherence score for the model.
"""
topics = []
for k in range(model.k):
word_probs = model.get_topic_words(k, topn)
topics.append([word for word, prob in word_probs])
texts = []
corpus = []
for doc in model.docs:
words = [model.vocabs[token_id] for token_id in doc.words]
texts.append(words)
freqs = list(collections.Counter(doc.words).items())
corpus.append(freqs)
id2word = dict(enumerate(model.vocabs))
dictionary = gensim.corpora.dictionary.Dictionary.from_corpus(
corpus, id2word
)
cm = gensim.models.coherencemodel.CoherenceModel(
topics=topics,
texts=texts,
corpus=corpus,
dictionary=dictionary,
window_size=window_size,
coherence=coherence,
topn=topn,
processes=processes,
)
return cm.get_coherence()
Since version 0.10.0, the module tomotopy.utils.coherence
was added.
Please see the example:
https://github.com/bab2min/tomotopy/blob/main/examples/coherence.py
Currently,
tomotopy
doesn't provide any function about topic coherence. Thus you may use gensim's coherencemodel or compute the score manually. I plan to add similar features togensim's coherencemodel
in the next update, so please use above options until then.