dice-group / AGDISTIS

AGDISTIS - Agnostic Named Entity Disambiguation
http://aksw.org/Projects/AGDISTIS.html
GNU Affero General Public License v3.0
141 stars 37 forks source link

Ubuntu Linux on an AWS EC2 instance works but Mac not #41

Closed RicardoUsbeck closed 7 years ago

RicardoUsbeck commented 7 years ago

Steps to replicate problem:

Cloned https://github.com/AKSW/AGDISTIS (master branch) Downloaded English index files: 2014: http://titan.informatik.uni-leipzig.de/rusbeck/agdistis/en/indexdbpedia_en_2014.7z 2016: wget http://titan.informatik.uni-leipzig.de/dmoussallem/dbpedia_index/en/indexdbpedia_en_2016.zip

Set config file src/main/resources/config/agdistis.properties so the 'index' property points to the directory that holds the extracted files. (We tried with both 2014 and then 2016)

Started AGDISTIS with 'mvn tomcat:run'. If one tests it it returns a correct response for this: curl --data-urlencode "text='The <entity>University of Leipzig</entity> in <entity>Barack Obama</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS

Tested 2510 documents that we send to the local endpoint one by one using a simple python script. The files are UTF-8 plain text with named entities enclosed by ... . There is an archive attached containing first 129 documents.

What happens is that some documents are processed perfectly, but after one point the server starts responding "Internal Server Error" (an HTML template) and then after some documents the endpoint stops responding completely. The service doesn't always first fail at the same document. Sometimes it fails at a document it managed to process at a previous run (after restarting).

There's also one document that crashes the server every time, right away: adidas.001.d-6mTcK2mUsS8HKwz9Fyk2Z5ZDE.txt (see test_docs.zip)

We tried debugging the source code while processing this single file. We found that the code always crashes at the same line (see exception in the log) but always while processing a different random named entity in the text.

I'm using OS X 10.12.3 and Java 1.8.0_92. One more thing, I had to do "ulimit -n 10000" before running the server because without it, the server always crashes with a "Too many open files" exception (different from the exceptions happening otherwise).

test_docs.zip

DiegoMoussallem commented 7 years ago

@RicardoUsbeck, please ask them to try again.

DiegoMoussallem commented 7 years ago

Fixed.