Closed keepRunning2017 closed 6 years ago
Which "entity number"? Can you post the entire error message or give more details?
I think I met the same problem with keepRunning2017, I failed to use either "de.tudarmstadt.ukp.wikipedia.datamachine-1.0.0-jar-with-dependencies.jar" or "de.tudarmstadt.ukp.wikipedia.datamachine-1.1.0-jar-with-dependencies.jar" on the latest english wiki data. I have no idea how to solve this. Here is my entire error message: ######################################################## "Date/Time","Total Memory","Free Memory","Message" "2018.04.22 00:07:27","2058354688","1983119176","parse input dumps..." "2018.04.22 00:07:27","2058354688","1983119176","Discussions are unavailable" _"2018.04.22 00:09:45","1853358080","1830626552","org.xml.sax.SAXParseException; lineNumber: 61983422; columnNumber: 164; JAXP00010004: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING"._
de.tudarmstadt.ukp.wikipedia.wikimachine.dump.xml.AbstractXmlDumpReader.readDump(AbstractXmlDumpReader.java:209)
de.tudarmstadt.ukp.wikipedia.datamachine.dump.xml.XML2Binary.
This is a known issue - see here: https://github.com/dkpro/dkpro-jwpl/issues/144
When I used de.tudarmstadt.ukp.wikipedia.datamachine-1.0.0-jar-with-dependencies to parse chinese wiki dumps, it reported that the entity number exceeded 50000000. Could anyone solve this problem?