dice-group / palmetto-py

Python interface for https://github.com/dice-group/Palmetto
Apache License 2.0
40 stars 9 forks source link

Mismatch between expected coherence and the one obtained #2

Closed simon-roca closed 5 years ago

simon-roca commented 6 years ago

Hello there!

First of all thank you for making this code available. I've recently had knowledge of topic coherence measures, trying to apply it in my own experiments.

Testing this tool, I've entered the topics in 'Exploring the Space of Topic Coherence Measures" (Table 8: The five Wikipedia topics with highest and lowest coherences using best preforming measure CV).

However, I won't obtain the same results - sometimes closer, sometimes not.

I'd really appreciate if you could tell me if this is a bug or I'm confused about how to run this tool. As far as I know, I use the default endpoint, and the same coherence measure (Cv), but maybe I missed something.

Thank you!

screen shot 2018-04-17 at 1 22 23 pm

My code and results:

palmetto = Palmetto()
topics = [['company', 'sell', 'corporation', 'own', 'acquire',
           'purchase', 'buy', 'business', 'sale', 'owner'],  # expected 0.94
          ['age', 'population', 'household', 'female', 'family',
           'census', 'live', 'average', 'median', 'income'],  # expected 0.91
          ['know', 'call', 'name', 'several', 'refer', 'oth',
           'hunter', 'hunt', 'thompson', 'include'],  # expected 0.29
          ['mark', 'paul', 'heart', 'take', 'read', 'harrison',
           'follow', 'become', 'know', 'include']  # expected 0.27
          ]

for t in topics:
    print t
    print palmetto.get_coherence(topicwords, coherence_type='cv'))

# RESULTS:
['company', 'sell', 'corporation', 'own', 'acquire', 'purchase', 'buy', 'business', 'sale', 'owner']
0.512070592512
['age', 'population', 'household', 'female', 'family', 'census', 'live', 'average', 'median', 'income']
0.751744750787
['know', 'call', 'name', 'several', 'refer', 'oth', 'hunter', 'hunt', 'thompson', 'include']
0.331680940678
['mark', 'paul', 'heart', 'take', 'read', 'harrison', 'follow', 'become', 'know', 'include']
0.320180859741
earthquakesan commented 6 years ago

@RicardoUsbeck

RicardoUsbeck commented 6 years ago

@MichaelRoeder

MichaelRoeder commented 6 years ago

See https://github.com/dice-group/Palmetto/issues/13