atilika / kuromoji

Kuromoji is a self-contained and very easy to use Japanese morphological analyzer designed for search
Apache License 2.0
950 stars 131 forks source link

GC overhead limit exceeded when compiling IPADIC NEologd #79

Closed kallewoof closed 8 years ago

kallewoof commented 8 years ago

I am getting a GC overhead limit exceeded message when trying to run

mvn clean package

on Mac OS X (MBP 2012 model).

The GC overhead limit exceeded message apparently means the system is spending 98% of the time doing GC and only 2% or less time doing any tasks which sounds like memory may not actually be the problem. However I am not sure how to test increasing memory, so I can't say for certain.


[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ kuromoji-ipadic ---
[INFO] Building jar: /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic/target/kuromoji-ipadic-1.0-SNAPSHOT.jar
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building Kuromoji IPADIC NEologd 1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ kuromoji-ipadic-neologd ---
[INFO] Deleting /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/target
[INFO]
[INFO] --- maven-resources-plugin:2.7:copy-resources (copy-license-resources) @ kuromoji-ipadic-neologd ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 2 resources
[INFO] Copying 1 resource
[INFO]
[INFO] --- maven-antrun-plugin:1.6:run (download-dictionary) @ kuromoji-ipadic-neologd ---
[INFO] Executing tasks

main:
     [echo] Downloading dictionary
   [delete] Deleting directory /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/dictionary
    [mkdir] Created dir: /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/dictionary
      [get] Getting: http://atilika.com/releases/mecab-ipadic-neologd/mecab-ipadic-2.7.0-20070801-neologd-20150925.tar.gz
      [get] To: /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/dictionary/mecab-ipadic-2.7.0-20070801-neologd-20150925.tar.gz
    [untar] Expanding: /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/dictionary/mecab-ipadic-2.7.0-20070801-neologd-20150925.tar.gz into /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/dictionary
[INFO] Executed tasks
[INFO]
[INFO] --- maven-compiler-plugin:3.3:compile (compile-dictionary-compiler) @ kuromoji-ipadic-neologd ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 6 source files to /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/target/classes
[INFO] /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/src/main/java/com/atilika/kuromoji/ipadic/neologd/Tokenizer.java: /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/src/main/java/com/atilika/kuromoji/ipadic/neologd/Tokenizer.java uses unchecked or unsafe operations.
[INFO] /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/src/main/java/com/atilika/kuromoji/ipadic/neologd/Tokenizer.java: Recompile with -Xlint:unchecked for details.
[INFO]
[INFO] >>> exec-maven-plugin:1.2.1:java (run-dictionary-compiler) > validate @ kuromoji-ipadic-neologd >>>
[INFO]
[INFO] <<< exec-maven-plugin:1.2.1:java (run-dictionary-compiler) < validate @ kuromoji-ipadic-neologd <<<
[INFO]
[INFO] --- exec-maven-plugin:1.2.1:java (run-dictionary-compiler) @ kuromoji-ipadic-neologd ---
[KUROMOJI] 22:30:21: dictionary compiler
[KUROMOJI] 22:30:21:
[KUROMOJI] 22:30:21: input directory: /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/dictionary/mecab-ipadic-2.7.0-20070801-neologd-20150925
[KUROMOJI] 22:30:21: output directory: /Users/zwoc/Workspace/deep-learning/narou/select/preproc/kuromoji/kuromoji-ipadic-neologd/src/main/resources/com/atilika/kuromoji/ipadic/neologd
[KUROMOJI] 22:30:21: input encoding: utf-8
[KUROMOJI] 22:30:21:
[KUROMOJI] 22:30:21: compiling tokeninfo dict...
[KUROMOJI] 22:30:21:     analyzing dictionary features
[KUROMOJI] 22:30:27:     reading tokeninfo
[KUROMOJI] 22:30:52:     compiling fst... [WARNING]
java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
    at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.ArrayList.iterator(ArrayList.java:814)
    at java.util.AbstractList.hashCode(AbstractList.java:540)
    at com.atilika.kuromoji.fst.State.hashCode(State.java:127)
    at com.atilika.kuromoji.fst.Builder.findEquivalentState(Builder.java:243)
    at com.atilika.kuromoji.fst.Builder.freezeAndPointToNewState(Builder.java:179)
    at com.atilika.kuromoji.fst.Builder.createDictionaryCommon(Builder.java:143)
    at com.atilika.kuromoji.fst.Builder.build(Builder.java:119)
    at com.atilika.kuromoji.compile.FSTCompiler.compile(FSTCompiler.java:44)
    at com.atilika.kuromoji.compile.DictionaryCompilerBase.buildTokenInfoDictionary(DictionaryCompilerBase.java:70)
    at com.atilika.kuromoji.compile.DictionaryCompilerBase.build(DictionaryCompilerBase.java:37)
    at com.atilika.kuromoji.compile.DictionaryCompilerBase.build(DictionaryCompilerBase.java:172)
    at com.atilika.kuromoji.ipadic.neologd.compile.DictionaryCompiler.main(DictionaryCompiler.java:33)
    ... 6 more
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Kuromoji ........................................... SUCCESS [  0.186 s]
[INFO] Kuromoji Core ...................................... SUCCESS [  7.223 s]
[INFO] Kuromoji IPADIC .................................... SUCCESS [ 44.321 s]
[INFO] Kuromoji IPADIC NEologd ............................ FAILURE [03:53 min]
[INFO] Kuromoji JUMAN DIC ................................. SKIPPED
[INFO] Kuromoji NAIST-jdic ................................ SKIPPED
[INFO] Kuromoji UniDic .................................... SKIPPED
[INFO] Kuromoji UniDic Kana Accent ........................ SKIPPED
[INFO] Kuromoji UniDic NEologd ............................ SKIPPED
[INFO] Kuromoji Benchmark ................................. SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 04:45 min
[INFO] Finished at: 2015-10-07T22:33:55+09:00
[INFO] Final Memory: 15M/1834M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2.1:java (run-dictionary-compiler) on project kuromoji-ipadic-neologd: An exception occured while executing the Java class. null: InvocationTargetException: GC overhead limit exceeded -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn <goals> -rf :kuromoji-ipadic-neologd
gautela commented 8 years ago

Java might give you different max heap size depending on your hardware. Could you try to increase the memory settings for maven? 3GB should be enough to build all dictionaries.

export MAVEN_OPTS=-Xmx3g

kallewoof commented 8 years ago

Yep that fixed it, thanks! Unfortunately tests are failing for the benchmark but I'll post that as a new issue.