dice-group / Palmetto

Palmetto is a quality measuring tool for topics
GNU Affero General Public License v3.0
209 stars 36 forks source link

502 Proxy Error on a specific request #36

Closed bt2901 closed 3 years ago

bt2901 commented 4 years ago

For some reason, the following requests fails:

http://palmetto.aksw.org/palmetto-webapp/service/uci?words=one%20would%20get%20go%20say%20think%20make%20time%20know%20like

The weird thing is that I get a valid answer if I replace UCI coherence with different kind of coherence (I tried UMass). UCI coherence of different word sets also appears to be working fine.

EDIT: by bisection, I found out that some words in the second half of set (think%20make%20time%20know%20like) seem to be problematic.

MichaelRoeder commented 4 years ago

Locally, it works when I use Palmetto from command line. (The command might be different to the latest version on the master branch)

someone@somewhere:~/workspace/Palmetto/palmetto$ time java -cp target/palmetto-exec.jar org.aksw.palmetto.Palmetto ~/data/wikipedia_bd uci temp-test.txt 
2020-02-17 17:54:09,282 INFO [org.aksw.palmetto.Palmetto] - <Read 1 from file.>
    0   0.78224 [one, would, get, go, say, think, make, time, know, like]

real    0m21.359s
user    0m46.997s
sys 0m1.951s

Typically, common words can cause some issues with UCI simply because the coherence is based on a sliding window. Hence, for calculating the coherence, the positions of the single words within the documents have to be retrieved and analyzed.

In contrast, UMass is only interested whether words are co-occurring within documents, i.e., for this coherence the single documents do not have to be opened and the positions do not matter which leads to much better run times. (see the Wiki or the publications for further details)

I am not sure whether this really answers your question :thinking:

MichaelRoeder commented 4 years ago

Just to be on the safe side, I restarted the service and it is now responding with the correct coherence. It looked like the service simply got stuck for more complex coherences like UCI while it was still able to calculate easier stuff like UMass.

Sorry for the inconveniences this issue may have caused.