Python interface for Palmetto
NOTE: If palmetto endpoint is down, create issue in palmetto repository.
First install palmettopy from the pipy:
pip install palmettopy
Then you can include it into your application and get coherence for a list of words as follows:
from palmettopy.palmetto import Palmetto
palmetto = Palmetto()
words = ["cake", "apple", "banana", "cherry", "chocolate"]
palmetto.get_coherence(words)
By default, this interface uses "cp" coherence type. The coherence type can be customized as follows:
from palmettopy.palmetto import Palmetto
palmetto = Palmetto()
words = ["cake", "apple", "banana", "cherry", "chocolate"]
palmetto.get_coherence(words, coherence_type="cp")
The available coherence types are "ca", "cp", "cv", "npmi", "uci", and "umass". (Please note that "cv" is not recommended anymore.)
The default endpoint is run by AKSW research group [2]. If you want to run your own endpoint (e.g., "http://example.com/myownendpoint"
), you can customize the interface as follows:
from palmettopy.palmetto import Palmetto
palmetto = Palmetto("http://example.com/myownendpoint/service/")
words = ["cake", "apple", "banana", "cherry", "chocolate"]
palmetto.get_coherence(words, coherence_type="cp")
You can also calculate fast coherence using document frequencies for terms using get_coherence_fast method as follows:
from palmettopy.palmetto import Palmetto
palmetto = Palmetto()
words = ["cake", "apple", "banana", "cherry", "chocolate"]
palmetto.get_coherence_fast(words)
To get document frequencies for the words you can use the following method:
from palmettopy.palmetto import Palmetto
palmetto = Palmetto()
words = ["cake", "apple", "banana", "cherry", "chocolate"]
palmetto.get_df_for_words(words)
pip install -r requirements.txt
pip install -e ./
make test
Implement coherence calculation for two words as follows:
Two words w_i, w_j with two sets s_i, s_j, their intersection set s_ij (|s| is the size of these sets) and the size of the corpus C:
P(w_i,w_j)/(P(w_i)*P(w_j))
= (|s_ij| / C)/((|s_i| / C)*(|s_j| / C))
= (s_ij * C)/(|s_i|*|s_j|)
No need to calculate the logarithm as this will not affect the ranking.
Ivan Ermilov: github account Michael Roeder: github account
This interface is licensed with Apache 2.0 license. For Palmetto license, see the Palmetto github repo.