Java heap space / OutOfMemoryError on RHEL 5.8 Server/Tikanga/JRE-6u35 - Ideas?

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1.  Have access to a Red Hat Enterprise Linux Server release 5.8 32-bit 
(Tikanga) with 2GB of memory or so.
2.  Create a YAML file larger than about 4-5 MB in size.  (I've tried various 
methods of doing this in an attempt to find a large YAML that *would* load - 
validation checks out that it's properly formed YAML)
3.  Attempt to run a Java program using Snakeyaml and read in the large YAML 
file (I'm using Object yamlObject = yaml.load(file);)
5.  After chewing on the YAML for about 30 seconds, spews out a java heap space 
/ OutOfMemoryError.

What is the expected output? What do you see instead?
I expect the YAML file to load normally.  This procedure works just fine in 
CentOS 6.2 in all configurations that I've tried

What version of SnakeYAML are you using? On what Java version?
Java 6u35, SnakeYAML 1.10 (have not tried 1.11 yet, didn't see it until just 
now :) )

Please provide any additional information below. (Often a failing test is
the best way to describe the problem.)

This appears to have nothing to do with how much memory is *actually* available 
- I have attempted to narrow down this problem, trying multiple memory/swap 
file configurations.  I have duplicated the problem on a system using the 
described OS/Java/Snakeyaml configuration, and I have also duplicated the 
problem on a VM with the same configuration.  I've also tried upping the memory 
available to the VM, adding/removing swap space... doesn't seem to matter.  Top 
only ever shows the process using about 300 MB tops.

I cannot reproduce on CentOS 6.2 in any configuration, 32- or 64-bit - I tried 
disabling swap file and dropping memory to 1GB, still reads the YAMLs just fine.

I also can't reproduce this on the target system with other Java applications 
(yet) - e.g. we have an Eclipse-based Java application installed that, if 
running, requests -Xmx1024M and runs just fine.

I have tried launching the application that uses Snakeyaml with various memory 
configurations, e.g. -Xmx512M, with MaxPermSize=128M, Xmx1024M/MaxPermSize=256, 
etc... have not made it any better.

This is for two systems that are offline and do not receive security updates, 
so there is a chance that snakeyaml may have nothing to do with this, but I 
wanted to ask in case this rings any bells and perhaps you might have some 
advice for how I could attempt to start the VM so as to have it load, or what 
if any patches I should attempt to get permission to install.  Users are open 
to replacing the OS with something else, however, it is desired that we first 
try and find a way to make it work on RHEL 5.8 Server/Tikanga, hence my plea 
for help :)  Thank you,

-- Joren

Original issue reported on code.google.com by IDT.Joren on 2 Oct 2012 at 4:33

GoogleCodeExporter commented 9 years ago

Here's an example stack trace:

!ENTRY org.eclipse.osgi 4 0 2012-10-02 07:53:22.017
!MESSAGE Application error
!STACK 1
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.<init>(String.java:216)
    at java.lang.StringBuilder.toString(StringBuilder.java:430)
    at org.yaml.snakeyaml.scanner.ScannerImpl.scanPlain(ScannerImpl.java:2024)
    at org.yaml.snakeyaml.scanner.ScannerImpl.fetchPlain(ScannerImpl.java:1039)
    at org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:399)
    at org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(ScannerImpl.java:224)
    at org.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingKey.produce(ParserImpl.java:562)
    at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:160)
    at org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:145)
    at org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:230)
    at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159)
    at org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:237)
    at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159)
    at org.yaml.snakeyaml.composer.Composer.composeSequenceNode(Composer.java:204)
    at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:157)
    at org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:237)
    at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159)
    at org.yaml.snakeyaml.composer.Composer.composeSequenceNode(Composer.java:204)
    at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:157)
    at org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:237)
    at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159)
    at org.yaml.snakeyaml.composer.Composer.composeDocument(Composer.java:122)
    at org.yaml.snakeyaml.composer.Composer.getSingleNode(Composer.java:105)
    at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:120)
    at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:480)
    at org.yaml.snakeyaml.Yaml.load(Yaml.java:411)

Thanks,

-- Joren

Original comment by IDT.Joren on 2 Oct 2012 at 4:47

GoogleCodeExporter commented 9 years ago

Am discovering that I can get it to fail on *any* platform, as long as I use a 
big enough YAML.  The size needed to make it fail seems to vary from platform 
to platform, and oddly, has nothing to do with how much memory is *actually* 
available (e.g. the amount that makes it fail on 1GB is the same as on 2GB, but 
differs by platform)

It took 100MB file to make CentOS 6.2 fail, and 22MB file to make Windows 7 Pro 
fail.

-- Joren

Original comment by IDT.Joren on 2 Oct 2012 at 9:45

GoogleCodeExporter commented 9 years ago

This issue is not related to SnakeYAML directly. Yes, SnakeYAML must create the 
complete representation graph and it requires a lot of memory. 
1) if your file is small, and the graph can possibly fit into the memory then 
please see how you can increase the heap size for the Java Virtual Machine
2) if the YAML document is very big, then it might indicate that it is  a huge 
list of very simple structures (a log ?). Then you can have a look at the 
low-level API 
(http://code.google.com/p/snakeyaml/wiki/Documentation#Low_Level_API). It is 
similar to SAX interface for XML.

Original comment by py4fun@gmail.com on 3 Oct 2012 at 6:31

GoogleCodeExporter commented 9 years ago

SnakeYAML memory footprint isn't small and heavily relies on Garbage Collector.
It used to be bigger and quite possible that it could be improved, but it may 
need a lot of "research" in that direction ...

btw, sharing YAML file (if possible) might be useful.
dropbox maybe?

Original comment by alexande...@gmail.com on 3 Oct 2012 at 6:43

GoogleCodeExporter commented 9 years ago

Here is an example of a failing YAML (CentOS 6.2, VM with 1.5GB):  
http://www.speedyshare.com/dxYu9/test.yml

(this was created by copying the inventory_locked sequence entry over and over 
again)

It is a kind of log, but since the file is only 20MB in size, I was not 
expecting that 1GB of memory would not be enough to process it.

-- Joren

Original comment by IDT.Joren on 3 Oct 2012 at 5:05

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

First of all, you cannot assume that 20MB file would require 20MB memory. 
SnakeYAML creates a lot of metadata while parsing a document. The first step is 
to produce events (which is already many times more then the size of the file), 
then the representation Node, then the final Java structure.
As I said, your file is a collection of relatively simple structures. You can 
try to use the low-level API. Then the whole memory management will be in your 
hands.

Since I not see what can be done in SnakeYAML, the issue will be closed.

Original comment by py4fun@gmail.com on 6 Oct 2012 at 3:45

GoogleCodeExporter commented 9 years ago

No, but nor should I be expected to assume that a 20MB file would require in 
excess of 1 GB of memory.

The low-level parsing worked, by the way.  Although the memory consumption for 
the load() method seems excessive, I'm glad there is an alternative.  Thanks,

-- Joren

Original comment by IDT.Joren on 9 Oct 2012 at 4:45

GoogleCodeExporter commented 9 years ago

If you can prove that parsing an XML document of the same size consumes much 
less resources, we can look at this issue.

Original comment by py4fun@gmail.com on 10 Oct 2012 at 8:29

Changed state: Done

GoogleCodeExporter commented 9 years ago

I also would not expect 20MB YAML to fail on 1GB memory, but there could be so 
many reasons for this...
may be at some point we will look into this, but can't promise.

Original comment by alexande...@gmail.com on 10 Oct 2012 at 8:45

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

I'm hitting this issue to but with a 60MB file. Hrmmmph

Original comment by katkin.m...@gmail.com on 28 Nov 2012 at 10:55

UcasRichard / snakeyaml

Java heap space / OutOfMemoryError on RHEL 5.8 Server/Tikanga/JRE-6u35 - Ideas? #159