Closed bt2901 closed 3 years ago
Locally, it works when I use Palmetto from command line. (The command might be different to the latest version on the master branch)
someone@somewhere:~/workspace/Palmetto/palmetto$ time java -cp target/palmetto-exec.jar org.aksw.palmetto.Palmetto ~/data/wikipedia_bd uci temp-test.txt
2020-02-17 17:54:09,282 INFO [org.aksw.palmetto.Palmetto] - <Read 1 from file.>
0 0.78224 [one, would, get, go, say, think, make, time, know, like]
real 0m21.359s
user 0m46.997s
sys 0m1.951s
Typically, common words can cause some issues with UCI simply because the coherence is based on a sliding window. Hence, for calculating the coherence, the positions of the single words within the documents have to be retrieved and analyzed.
In contrast, UMass is only interested whether words are co-occurring within documents, i.e., for this coherence the single documents do not have to be opened and the positions do not matter which leads to much better run times. (see the Wiki or the publications for further details)
I am not sure whether this really answers your question :thinking:
Just to be on the safe side, I restarted the service and it is now responding with the correct coherence. It looked like the service simply got stuck for more complex coherences like UCI while it was still able to calculate easier stuff like UMass.
Sorry for the inconveniences this issue may have caused.
For some reason, the following requests fails:
The weird thing is that I get a valid answer if I replace UCI coherence with different kind of coherence (I tried UMass). UCI coherence of different word sets also appears to be working fine.
EDIT: by bisection, I found out that some words in the second half of set (
think%20make%20time%20know%20like
) seem to be problematic.