I am working with BERTopic and I am trying to evaluate my topic models trained on Marathi language (Indic language) using some metrics.I found this code written by MaartenGR (Author of BERTopic) but unfortunately I was not able to install the dependencies of the setup he has mentioned here (https://github.com/MaartenGr/BERTopic_evaluation/tree/main). The author recommended using OCTIS as it provides more metrics. I tried calculating the topic diversity and npmi score. The topic diversity is calculated,but I keep getting issues while calculating npmi score.
Here is my code
from octis.evaluation_metrics.coherence_metrics import Coherence
from octis.evaluation_metrics.diversity_metrics import TopicDiversity
#This is how the sentence arrays looks
sentence array = ['तीन दिवस झाले, पण गाडी अजून सापडली नाही. पोलिसांचा कडक तपास सुरु आहे.' , 'डाळी भारतीय थाळीमध्ये सामील असलेले मुख्य भोजन आहेत.']
#This is how the topics are
topics_list = [
['ठाकरे', 'एक', 'भारतीय', 'दिवस', 'शिंदे', 'सांगितले', 'दोन', 'माहिती', 'देण्यात', 'जात'],
['भारतीय', 'शिंदे', 'ठाकरे', 'मुख्यमंत्री', 'उद्धव', 'एक', 'पोलीस', 'धावा', 'दोन', 'सरकार'],
['देण्यात', 'फोन', 'डेटा', 'कॅमेरा', 'स्मार्टफोन', 'सादर', 'डिस्प्ले', 'सेन्सर', 'सपोर्ट', 'बॅटरी']
]
octis_texts = [sentence_array]
npmi = Coherence(texts = octis_texts, topk = 10, measure = 'c_npmi')
octis_output = {"topics": list1}
topic_diversity = TopicDiversity(topk=10)
topic_diversity_score = topic_diversity.score(octis_output)
print("Topic diversity: "+str(topic_diversity_score))
npmi_score = npmi.score(octis_output)
print("Coherence: "+str(npmi_score))
Error
This is the error I get.
Topic diversity: 0.8857142857142857
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
[<ipython-input-68-c000efdb667a>](https://localhost:8080/#) in <cell line: 5>()
3 print("Topic diversity: "+str(topic_diversity_score))
4
----> 5 npmi_score = npmi.score(octis_output)
6 print("Coherence: "+str(npmi_score))
3 frames
[/usr/local/lib/python3.10/dist-packages/gensim/models/coherencemodel.py](https://localhost:8080/#) in _ensure_elements_are_ids(self, topic)
452 return np.array(ids_from_ids)
453 else:
--> 454 raise ValueError('unable to interpret topic as either a list of tokens or a list of ids')
455
456 def _update_accumulator(self, new_topics):
ValueError: unable to interpret topic as either a list of tokens or a list of ids
Can anyone point out what exactly is wrong here and how can i evaluate BERTopic models trained on indic languages.
Description
I am working with BERTopic and I am trying to evaluate my topic models trained on Marathi language (Indic language) using some metrics.I found this code written by MaartenGR (Author of BERTopic) but unfortunately I was not able to install the dependencies of the setup he has mentioned here (https://github.com/MaartenGr/BERTopic_evaluation/tree/main). The author recommended using OCTIS as it provides more metrics. I tried calculating the topic diversity and npmi score. The topic diversity is calculated,but I keep getting issues while calculating npmi score.
Here is my code
Error
This is the error I get.
Can anyone point out what exactly is wrong here and how can i evaluate BERTopic models trained on indic languages.
Thanks.