The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING"

EllysahatNMI commented 8 years ago

Hi! I followed the step by step instruction but encountered this error after running ../clean-install-run I used the enwiki-20161001-pages-articles.xml.bz2 and I know it's a huge file but how do I get away with this error? I tried putting the -DentityExpansionLimit=2147480000 in the clean-install-run like this: mvn -DentityExpansionLimit=2147480000 ... but I still get the same error. Please help me.

jimkont commented 8 years ago

I also noticed this problem at some point but thought of it as a dump problem

I see a few references around like https://github.com/elastic/stream2es/issues/65 https://github.com/Wikidata/Wikidata-Toolkit/issues/243 https://github.com/Wikidata/Wikidata-Toolkit/pull/244

can you test the following? Edit the run script and add the argument described here and see if it works? https://github.com/elastic/stream2es/issues/65#issuecomment-250917529

EllysahatNMI commented 8 years ago

I already tried putting jdk.xml.totalEntitySizeLimit and totalEntitySizeLimit as indicated in elastic/stream2es#65 (comment) but I still get the same error. It stops when the import is around 600,000+ pages already.

I edited the clean-install-run script like this:

mvn -Djdk.xml.totalEntitySizeLimit=2147480000 $BATCH -f ../pom.xml clean && . ../install-run "$@"

I'm out of ideas. Please help.

jimkont commented 8 years ago

can you edit the run script? clean-install-run calls install-run and that calls run in the end

EllysahatNMI commented 8 years ago

Hi. Sorry I'm a little confused on where I should put it in the run script.

This is how I did it: mvn $MAVEN_DEBUG $BATCH scala:run "-Dlauncher=$LAUNCHER" "-DaddArgs=$ADD_ARGS" "-Djdk.xml.totalEntitySizeLimit=2147480000"

Is it correct?

jimkont commented 8 years ago

this is the line: https://github.com/dbpedia/extraction-framework/blob/master/run#L45 not sure if the package is needed so maybe add both -Djdk.xml.totalEntitySizeLimit=2147480000 -DtotalEntitySizeLimit=214748000 0"

EllysahatNMI commented 8 years ago

I tried doing as you said but same error appears.

I edited https://github.com/dbpedia/extraction-framework/blob/master/run#L45 like this: mvn $MAVEN_DEBUG -DtotalEntitySizeLimit=2147480000 -Djdk.xml.totalEntitySizeLimit=2147480000 $BATCH scala:run "-Dlauncher=$LAUNCHER" "-DaddArgs=$ADD_ARGS"

jimkont commented 8 years ago

sorry again, let's make a final check

try putting this here: https://github.com/dbpedia/extraction-framework/blob/master/dump/pom.xml#L64-L71 this is a JVM argument and we should either pass it from the pom.xml or MAVEN_OPTS not sure if directly from maven would work

EllysahatNMI commented 8 years ago

Still the same error when I put this:

-DtotalEntitySizeLimit=2147480000

     <jvmArg>-Djdk.xml.totalEntitySizeLimit=2147480000</jvmArg>

EllysahatNMI commented 8 years ago

I put it in this line: https://github.com/dbpedia/extraction-framework/blob/master/dump/pom.xml#L39-L41

Here's the snippet:

 <jvmArgs>
     <jvmArg>-server</jvmArg>
     <jvmArg>-DtotalEntitySizeLimit=2147480000</jvmArg>
     <jvmArg>-Djdk.xml.totalEntitySizeLimit=2147480000</jvmArg>
</jvmArgs>

And it worked! Thank you so much! :)

jimkont commented 8 years ago

Great! can you do a final check to see which of the two arguments is the one needed? You can either tell us here or make a PR directly

Thanks!

chile12 commented 8 years ago

Thanks guys! This problem also popped up when extracting abstracts.

clanstyles commented 8 years ago

I'm trying to use the 'downloa.10000.properties' and then 'extraction.default.properties'

I'm receiving the same error.

Message: JAXP00010004: The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING".

I've tried the solutions above, I set extraction-framework/dump/pom.xml's jvmArgs

                                                        <mainClass>org.dbpedia.extraction.dump.sql.Import</mainClass>
                                                        <jvmArgs>
                                                                <jvmArg>-server</jvmArg>
                                                                <jvmArg>-DtotalEntitySizeLimit=0</jvmArg>
                                                                <jvmArg>-Djdk.xml.totalEntitySizeLimit=0</jvmArg>
                                                        </jvmArgs>

No luck fixing it. I then tried modifying ../run's script

if [[ $SLACK != false && $SlackUrl == https://hooks.slack.com/services* ]] ;
then
  #mvn $MAVEN_DEBUG $BATCH scala:run "-Dlauncher=SlackForwarder"  "-DaddArgs=$SlackUrl|$SlackRegexMap|$LogDir|$1|$SLACK" &
  sleep 5
  PID="$(ps ax | grep java | grep extraction.scripts.SlackForwarder | tail -1 | sed -n -E 's/([0-9]+).*/\1/p' | xargs)"
  echo $PID
  mvn $MAVEN_DEBUG -B scala:run "-Djdk.xml.totalEntitySizeLimit=0" "-Dlauncher=$LAUNCHER" "-DaddArgs=$ADD_ARGS" &> "/proc/$PID/fd/0"
else
  mvn $MAVEN_DEBUG $BATCH scala:run "-Djdk.xml.totalEntitySizeLimit=0" "-Dlauncher=$LAUNCHER" "-DaddArgs=$ADD_ARGS"
fi

You'll notice I added "-Djdk.xml.totalEntitySizeLimit=0". According to the docs, 0 is supposed to set it to unlimited. I did try it with the limits you listed above, that didn't work either.

chile12 commented 8 years ago

Interesting, I just ran an import over 130 languages without a hitch. Here is the launcher setting I used:

                    <launcher>
                        <id>import</id>
                        <mainClass>org.dbpedia.extraction.dump.sql.Import</mainClass>
                        <jvmArgs>
                            <jvmArg>-server</jvmArg>
                            <jvmArg>-DtotalEntitySizeLimit=2147480000</jvmArg>
                            <jvmArg>-Djdk.xml.totalEntitySizeLimit=2147480000</jvmArg>
                        </jvmArgs>
                        <args>
                            <!-- base folder of downloaded dumps -->
                            <arg>/data/extraction-data/2016-04</arg>
                            <!-- location of SQL file containing MediaWiki table definitions -->
                            <arg>/home/extractor/mediawikiTables.sql</arg>
                            <!-- JDBC URL of MySQL server. Import creates a new database for 
                                each wiki. -->
                            <arg>jdbc:mysql://localhost/?characterEncoding=UTF-8&amp;user=root</arg>
                            <!-- require-download-complete -->
                            <arg>true</arg> 
                            <!-- file name: pages-articles.xml{,.bz2,.gz} -->
                            <arg>pages-articles.xml.bz2</arg>
                            <!-- number of parallel imports; this number depends on the number of processors in use
                                and the type of hard disc (hhd/ssd) and how many parallel file reads it can support -->
                            <arg>16</arg>
                            <!-- languages and article count ranges, comma-separated, e.g. "de,en" 
                                or "@mappings" etc. -->
                            <arg>@downloaded</arg>
                        </args>
                    </launcher>

clanstyles commented 8 years ago

Was it Java 7 or 8? I was using Java 8 and I was reading that this was added to Java in 8. I'm not testing with Java 7.

chile12 commented 8 years ago

Using Java 8 as well, is this still causing problems with you?

clanstyles commented 8 years ago

@chile12 I'm still processing all of Wiki (2 days later), but it's working no issues. I switched to Java 7.

EllysahatNMI commented 8 years ago

Hi guys. Sorry for the late update.

So I just confirmed that this argument is the correct one:

-Djdk.xml.totalEntitySizeLimit=2147480000

<launcher>
                        <id>import</id>
                        <mainClass>org.dbpedia.extraction.dump.sql.Import</mainClass>
                        <jvmArgs>
                            <jvmArg>-server</jvmArg>
                            <jvmArg>-Djdk.xml.totalEntitySizeLimit=2147480000</jvmArg>
                        </jvmArgs>
                        <args>
                            <!-- base folder of downloaded dumps -->
                            <arg>/data/extraction-data/2016-04</arg>
                            <!-- location of SQL file containing MediaWiki table definitions -->
                            <arg>/home/extractor/mediawikiTables.sql</arg>
                            <!-- JDBC URL of MySQL server. Import creates a new database for 
                                each wiki. -->
                            <arg>jdbc:mysql://localhost/?characterEncoding=UTF-8&amp;user=root</arg>
                            <!-- require-download-complete -->
                            <arg>true</arg> 
                            <!-- file name: pages-articles.xml{,.bz2,.gz} -->
                            <arg>pages-articles.xml.bz2</arg>
                            <!-- number of parallel imports; this number depends on the number of processors in use
                                and the type of hard disc (hhd/ssd) and how many parallel file reads it can support -->
                            <arg>16</arg>
                            <!-- languages and article count ranges, comma-separated, e.g. "de,en" 
                                or "@mappings" etc. -->
                            <arg>@downloaded</arg>
                        </args>
                    </launcher>

roland-c commented 7 years ago

I ran into this problem with the Wikidata extractor, having these settings

<jvmArgs>
     <jvmArg>-server</jvmArg>
     <jvmArg>-DtotalEntitySizeLimit=2147480000</jvmArg>
     <jvmArg>-Djdk.xml.totalEntitySizeLimit=2147480000</jvmArg>
</jvmArgs>

It is possible to remove this limitation by setting the value to 0. The extractor runs fine now. http://www.ibm.com/support/knowledgecenter/SSYKE2_7.0.0/com.ibm.java.aix.70.doc/diag/appendixes/cmdline/Djdkxmltotalentitysizelimit.html

chile12 commented 7 years ago

Thanks for the update Roland, best to integrate this into all the POM files.

manzoorali29 commented 5 years ago

Can anyone tell me the final solution. I am still stuck in this and tried all the solutions but no luck. thanks

dbpedia / extraction-framework

The accumulated size of entities is "50,000,001" that exceeded the "50,000,000" limit set by "FEATURE_SECURE_PROCESSING" #487