Set config file src/main/resources/config/agdistis.properties so the 'index' property points to the directory that holds the extracted files. (We tried with both 2014 and then 2016)
Started AGDISTIS with 'mvn tomcat:run'. If one tests it it returns a correct response for this:
curl --data-urlencode "text='The <entity>University of Leipzig</entity> in <entity>Barack Obama</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS
Tested 2510 documents that we send to the local endpoint one by one using a simple python script. The files are UTF-8 plain text with named entities enclosed by ... . There is an archive attached containing first 129 documents.
What happens is that some documents are processed perfectly, but after one point the server starts responding "Internal Server Error" (an HTML template) and then after some documents the endpoint stops responding completely.
The service doesn't always first fail at the same document. Sometimes it fails at a document it managed to process at a previous run (after restarting).
There's also one document that crashes the server every time, right away: adidas.001.d-6mTcK2mUsS8HKwz9Fyk2Z5ZDE.txt (see test_docs.zip)
We tried debugging the source code while processing this single file. We found that the code always crashes at the same line (see exception in the log) but always while processing a different random named entity in the text.
I'm using OS X 10.12.3 and Java 1.8.0_92. One more thing, I had to do "ulimit -n 10000" before running the server because without it, the server always crashes with a "Too many open files" exception (different from the exceptions happening otherwise).
Steps to replicate problem:
Cloned https://github.com/AKSW/AGDISTIS (master branch) Downloaded English index files: 2014: http://titan.informatik.uni-leipzig.de/rusbeck/agdistis/en/indexdbpedia_en_2014.7z 2016: wget http://titan.informatik.uni-leipzig.de/dmoussallem/dbpedia_index/en/indexdbpedia_en_2016.zip
Set config file src/main/resources/config/agdistis.properties so the 'index' property points to the directory that holds the extracted files. (We tried with both 2014 and then 2016)
Started AGDISTIS with 'mvn tomcat:run'. If one tests it it returns a correct response for this:
curl --data-urlencode "text='The <entity>University of Leipzig</entity> in <entity>Barack Obama</entity>.'" -d type='agdistis' http://localhost:8080/AGDISTIS
Tested 2510 documents that we send to the local endpoint one by one using a simple python script. The files are UTF-8 plain text with named entities enclosed by ... . There is an archive attached containing first 129 documents.
What happens is that some documents are processed perfectly, but after one point the server starts responding "Internal Server Error" (an HTML template) and then after some documents the endpoint stops responding completely. The service doesn't always first fail at the same document. Sometimes it fails at a document it managed to process at a previous run (after restarting).
There's also one document that crashes the server every time, right away: adidas.001.d-6mTcK2mUsS8HKwz9Fyk2Z5ZDE.txt (see test_docs.zip)
We tried debugging the source code while processing this single file. We found that the code always crashes at the same line (see exception in the log) but always while processing a different random named entity in the text.
I'm using OS X 10.12.3 and Java 1.8.0_92. One more thing, I had to do "ulimit -n 10000" before running the server because without it, the server always crashes with a "Too many open files" exception (different from the exceptions happening otherwise).
test_docs.zip