Open HappyCoderMan opened 8 years ago
+1
I have looked for a workaround, but not had any success yet. I tried setting the -DentityExpansionLimit on the command line with values of 2147480000 and 0. Both of those options resulted in the same 50,000,000 limit error.
Example: java -DentityExpansionLimit=2147480000 -jar stream2es-test.jar ...
Instead of entityExpansionLimit
try using jdk.xml.totalEntitySizeLimit
(works for me using Java 8) or just totalEntitySizeLimit
if that doesn't work. The problem is that by default secure processing is used which limits the number of entities to 50,000,000 by default, the expansion limit controls entity expansion and you shouldn't need to adjust that when parsing a Wikipedia XML-dump.
Thank you very much for that suggestion. It appears to have worked. My Wikipedia index ran to 2.5X more documents than it did previously. (My run ran out of disk space and didn't complete, but that should be unrelated to this issue.)
nohup java -DentityExpansionLimit=2147480000 -DtotalEntitySizeLimit=2147480000 -Djdk.xml.totalEntitySizeLimit=2147480000 -Xmx2g -jar stream2es wiki --target http://es2:9200/en-wiki --source /mirror/enwiki-latest-pages-articles.xml.bz2 --log debug &
https://jira.atlassian.com/browse/JRA-62752?workflowName=JIRA+Bug+Workflow+w+Kanban+v6+-+Restricted&stepId=1
Thank you! @ourdark
When processing huge xml file, we can also set the value of property to 0 or -1, which indicates no limit. e.g. -DentityExpansionLimit=0 -DtotalEntitySizeLimit=0 -Djdk.xml.totalEntitySizeLimit=0
Reference: https://docs.oracle.com/javase/tutorial/jaxp/limits/limits.html
I am trying to index Wikipedia using a local bz2 copy to a local elasticsearch. It ran for a long time correctly, but then had an exception like this: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING"
This is what I ran: java -jar stream2es wiki --target http://localhost:9200/testwiki --log debug create index http://localhost:9200/testwiki --source /home/testuser/enwiki-latest-pages-articles.xml.bz2