area515 / Photonic3D

Control software for resin 3D printers
http://photonic3d.com
GNU General Public License v3.0
131 stars 115 forks source link

HexCodeBasedProjectorTesting test requires > 4GB heap to run #223

Closed jmkao closed 8 years ago

jmkao commented 8 years ago

I'm trying to stabilize the execution of tests in Travis CI and was running into an intermittent problem where Travis would kill the test runner causing the build to fail.

Upon further investigation, it seems that the documented memory allocation for the Docker container in Travis is 4GB, and the behavior of Linux cgroups resource control (that Docker is built on top of) will kill processes that exceed the allocated memory. Running on my local Linux box, it seems that the test suite will consume just around 4GB (sometimes a little more, sometimes a little less) during its run.

Constraining the -Xmx to 1GB, the test will fail during HexCodeBasedProjectorTesting consistently with the GC overhead exceeded (interestingly not running out of memory):

java.lang.OutOfMemoryError: GC overhead limit exceeded
    at com.fasterxml.jackson.databind.introspect.AnnotatedMethodMap.find(AnnotatedMethodMap.java:67)
    at com.fasterxml.jackson.databind.introspect.AnnotatedClass._addMemberMethods(AnnotatedClass.java:605)
    at com.fasterxml.jackson.databind.introspect.AnnotatedClass.resolveMemberMethods(AnnotatedClass.java:424)
    at com.fasterxml.jackson.databind.introspect.AnnotatedClass.memberMethods(AnnotatedClass.java:251)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addMethods(POJOPropertiesCollector.java:502)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collect(POJOPropertiesCollector.java:238)
    at com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.collectProperties(BasicClassIntrospector.java:142)
    at com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.forDeserialization(BasicClassIntrospector.java:81)
    at com.fasterxml.jackson.databind.introspect.BasicClassIntrospector.forDeserialization(BasicClassIntrospector.java:11)
    at com.fasterxml.jackson.databind.DeserializationConfig.introspect(DeserializationConfig.java:550)
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:323)
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:261)
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:241)
    at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142)
    at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:394)
    at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3169)
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3062)
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2168)
    at org.area515.resinprinter.server.HostProperties.getAutodetectProjectors(HostProperties.java:573)
    at org.area515.resinprinter.projector.HexCodeBasedProjectorTesting.loadPrinters(HexCodeBasedProjectorTesting.java:23)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
    at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
    at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
    at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
    at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
    at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
    at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)

I took a heap dump during the stage of the test where it consumes large amounts of memory, and the Dominator tree shows that the Finalizer is holding onto a mock object proxy for org.area515.resinprinter.serial.SerialCommunicationsPort which then holds onto a vast quantity (about 1.9 million) of LinkedList objects. From what I can tell, this isn't a memory leak so much as an extremely high volume of short-lived allocations that outpaces the garbage collector's ability to clean up until around the 4GB usage level. Capping the heap does not result in the JVM running out of memory, but the excessive amount of GC causes the JVM to throw a "GC Overhead Exceeded" type of OOM.

@WesGilster I'm not very familiar with Mockito and whether this particular mock object (org.area515.resinprinter.serial.SerialCommunicationsPort) might be configured to store some temporary state. And if so, why it would have so many invocations to it. Some of the LinkedList entries look like serial commands (e.g. * 0 Lamp ?.), but others are Object arrays containing int[]'s and short[]'s:

image

But I'm also not really that familiar with the serial port handling code. Any ideas?

TestDominatorReport.zip

WesGilster commented 8 years ago

I don't think you uploaded the hprof file, it seems to be a web capture of the dominator tree. Don't worry about it though, if it's 100% reproducible, I can get my own dump.

To answer your question, I haven't seen this before. My first guess is that this is a side effect of mocking InputStreams with PowerMock. First step is to see if a single test is responsible for all of this chaos or if it's a cumulative problem over the course of several tests. I'll start by narrowing out HexCodeBasedProjectorTesting and moving on from there.

WesGilster commented 8 years ago

I did a quick test. I'm not able to reproduce this on Windows until I put the limit down to 64m. That's pretty low mem, but I'm assuming I'm reproducing your exact issue.

I've also conclusively narrowed it down to the test: org.area515.resinprinter.projector.HexCodeBasedProjectorTesting.noJSONErrorsAndJavaScriptAutodetectProjectors()

That test won't run under 64m, but all other tests combined will run under 64m. I should be able to figure this out pretty easy now. I'll check it out further tomorrow.

WesGilster commented 8 years ago

The method I was calling was pretty pointless. There is a larger issue at play with your JVM, but it isn't worth tackling now. I just checked that in.

jmkao commented 8 years ago

Good enough :). This does does seem to manifest repeatably on 64-bit x86 Linux JVMs, both on my Linux box and in Travis, but not on Windows. Not sure why it's OS dependent; those are the hardest to figure out.