elastic / stream2es

Stream data into ES (Wikipedia, Twitter, stdin, or other ESes)
355 stars 60 forks source link

The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING" #65

Open HappyCoderMan opened 8 years ago

HappyCoderMan commented 8 years ago

I am trying to index Wikipedia using a local bz2 copy to a local elasticsearch. It ran for a long time correctly, but then had an exception like this: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING"

This is what I ran: java -jar stream2es wiki --target http://localhost:9200/testwiki --log debug create index http://localhost:9200/testwiki --source /home/testuser/enwiki-latest-pages-articles.xml.bz2

image

sharonadar commented 8 years ago

+1

HappyCoderMan commented 8 years ago

I have looked for a workaround, but not had any success yet. I tried setting the -DentityExpansionLimit on the command line with values of 2147480000 and 0. Both of those options resulted in the same 50,000,000 limit error.

Example: java -DentityExpansionLimit=2147480000 -jar stream2es-test.jar ...

aholstenson commented 8 years ago

Instead of entityExpansionLimit try using jdk.xml.totalEntitySizeLimit (works for me using Java 8) or just totalEntitySizeLimit if that doesn't work. The problem is that by default secure processing is used which limits the number of entities to 50,000,000 by default, the expansion limit controls entity expansion and you shouldn't need to adjust that when parsing a Wikipedia XML-dump.

HappyCoderMan commented 8 years ago

Thank you very much for that suggestion. It appears to have worked. My Wikipedia index ran to 2.5X more documents than it did previously. (My run ran out of disk space and didn't complete, but that should be unrelated to this issue.)

ourdark commented 7 years ago

nohup java -DentityExpansionLimit=2147480000 -DtotalEntitySizeLimit=2147480000 -Djdk.xml.totalEntitySizeLimit=2147480000 -Xmx2g -jar stream2es wiki --target http://es2:9200/en-wiki --source /mirror/enwiki-latest-pages-articles.xml.bz2 --log debug & https://jira.atlassian.com/browse/JRA-62752?workflowName=JIRA+Bug+Workflow+w+Kanban+v6+-+Restricted&stepId=1

slvher commented 6 years ago

Thank you! @ourdark

When processing huge xml file, we can also set the value of property to 0 or -1, which indicates no limit. e.g. -DentityExpansionLimit=0 -DtotalEntitySizeLimit=0 -Djdk.xml.totalEntitySizeLimit=0

Reference: https://docs.oracle.com/javase/tutorial/jaxp/limits/limits.html