delving / sip-creator

The Delving SIP-Creator
sip-creator.delving.eu
9 stars 3 forks source link

Memory-leak in processing/groovy scripting-engine #485

Closed hanswesterbeek closed 7 years ago

hanswesterbeek commented 7 years ago

I suspect that compiled scripts are leaked to the Metaspace whenever any generated groovy script is executed.

Exception in thread "Thread-3" java.lang.OutOfMemoryError: Metaspace

The only real solution is this, quoting AntoineB over at stackoverflow: "I have to recompile the scripts on every execution, and that I have to use a new GroovyScriptEngineImpl and a new GroovyClassLoader on every execution"

hanswesterbeek commented 7 years ago

http://stackoverflow.com/questions/37301117/java-groovy-memory-leak-in-groovyclassloader#37377202 ?

hanswesterbeek commented 7 years ago

This applies to our situation:

Well the code at the end of my post has the exact problem that I'm facing : It creates a ScriptXXX class on each turn of the loop, and no matter what I do, those classes are not released by the classloader, and not GCed, which ends up filling the memory. The part about the isolation is not important in itself, you just have to know that I have to recompile the scripts on every execution, and that I have to use a new GroovyScriptEngineImpl and a new GroovyClassLoader on every execution

hanswesterbeek commented 7 years ago

https://hoangx281283.wordpress.com/2014/07/24/unload-groovy-classes/

http://melix.github.io/blog/2015/08/permgenleak.html

hanswesterbeek commented 7 years ago

Profiling indicates that new classes are loaded into metaspace every time a dataset is processed, even if that dataset has been processed before.

hanswesterbeek commented 7 years ago

Clue: "The only way that a Class can be unloaded is if the Classloader used is garbage collected. This means, references to every single class and to the classloader itself need to go the way of the dodo."

This might mean we can make the problem go away if we let every processing invocation use a new classloader, which is unreferenced immediately after running.

hanswesterbeek commented 7 years ago

This applies to narthex: http://stackoverflow.com/questions/36407119/groovyshell-in-java8-memory-leak-duplicated-classes-src-code-load-test-pr

The solution posted on the link above does not solve the memory leak but merely reduces the application's vulnerability to it. Not quite what we need.

hanswesterbeek commented 7 years ago

Largely resolved by commit 93d09a1e7e22d8699.

Solution makes sure that Groovy-classes are not generated for every invocation of the same script. So once all datasets have been processed at least one time, the Metaspace should not require any further growth (for processing sake).

To take advantage of this behavior, be sure to use the newly introduced BulkMappingRunner