UcasRichard / snakeyaml

Automatically exported from code.google.com/p/snakeyaml
Apache License 2.0
0 stars 0 forks source link

I still have the out of memory exception opening yaml file #102

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.deserialize yaml file
2.
3.

What is the expected output? What do you see instead?
out of memory exception

What version of the product are you using? On what operating system?
snakeyaml-1.8-SNAPSHOT on  Mac Os X 10.6.6 (java 1.6.0_22)

Please provide any additional information below.

Maybe my JVM is too lazy performing garbage during deserialization but I 
resolved this issue changing StreamReader.java (I attach my StreamReader.java)

Thanks in advance 

Original issue reported on code.google.com by antonio....@gmail.com on 13 Jan 2011 at 8:32

Attachments:

GoogleCodeExporter commented 9 years ago
It would be nice to get that yaml file (if it is possible) to see where it 
leaks memory :)

Original comment by alexande...@gmail.com on 13 Jan 2011 at 8:56

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
It would greatly simplify fixing the problem if you could provide a valid JUnit 
test in your clone.
The YAML you have provided has a lot of tags which are not very simple to 
re-create.

Original comment by py4fun@gmail.com on 13 Jan 2011 at 12:50

GoogleCodeExporter commented 9 years ago
I really do not have a memory leak: my jvm is set with -Xmx80m and when I 
deserialize my yaml file  80mb limit is not respected.

It's difficult for me create a JUnit for various reasons.

I'd like to know if the change I made in StreamReader.java gives better 
performances in term of memory and doesn't have side effects

Thanks in advance

P.S.: I'm reorganizing my code and I will give you JUnit test as soon as 
possible

Original comment by antonio....@gmail.com on 14 Jan 2011 at 9:26

GoogleCodeExporter commented 9 years ago
I have managed to create a test with a big input file (1.5M) which fails to 
load due to OutOfMemory exception. But when I apply your changes in 
StreamReader the situation is not changed.
I think we need to see a real testcase where SnakeYAML fails to load a document 
but it succeeds with your patch.
Otherwise the solution is only to increase the memory for the JVM.

Original comment by py4fun@gmail.com on 14 Jan 2011 at 10:02

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
as long as I agree that there are areas to improve on memory consumption I am 
not so sure about your results ( I might be wrong also).
Did you consider that actually objects you are loading ARE BIG. You are running 
last GC with your object loaded already, why don't you try to put one more 
measure. Something like:

result = null;
System.gc();
.... memory results ...

The figures may be different.

Original comment by alexande...@gmail.com on 20 Jan 2011 at 9:17

GoogleCodeExporter commented 9 years ago
The size of the loaded object is considered: it is 4.69MB in size. That amount 
is reported by the "total" column. What concerns me most is the last column, 
the "recovered" amount: this is the amount that SnakeYaml allocated during 
parsing. Understandably, this will be a non-trivial amount, but 100+MB for less 
than 1/2 MB of input seems excessive.

I'll look into running a profiler on SnakeYaml today.

Original comment by JordanAn...@gmail.com on 20 Jan 2011 at 2:49

GoogleCodeExporter commented 9 years ago
Addendum: it should also be noted that the check you propose (checking after 
abandoning the loaded object) is performed on the next iteration, as that forms 
the "initial" value. The test shows a very stable initial value, so it's not 
that SnakeYaml is somehow leaking memory, it's just that there is a huge memory 
consumption.

Original comment by JordanAn...@gmail.com on 20 Jan 2011 at 5:15

GoogleCodeExporter commented 9 years ago
profiling would be nice. Good you have time for it ;)

But anyway strange - recovered never goes for me
      Eclipse 3.6 (and terminal), OSX 1.6.0_22, latest sources from master 
 1. over 13.300.000 under -Xms32m -Xmx32m
 2. over 28.000.000 under -Xms512m -Xmx512m

Original comment by alexande...@gmail.com on 20 Jan 2011 at 5:17

GoogleCodeExporter commented 9 years ago
I'm on a different machine than the original now, and I'm getting vastly 
different results with my test code (above). Specifically, I see memory 
consumption now as never exceeding about 12MB; although a little high that's 
considerably less worrisome.

Profiling with profile4j shows memory oscillating up and down between about 2MB 
and 12MB, which isn't unexpected. It's possible that there is something 
terribly wrong with the environment I originally posted from.

Original comment by JordanAn...@gmail.com on 20 Jan 2011 at 6:17

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
use 1.8-SNAPSHOT for now. By default it does not put context in Mark (so in 
case of error you will get only line number and position - line content will 
not be printed). It consumes less memory and works faster.

As for the lib optimization - hope we will have time to do it. Do not hesitate 
to do it yourself ;) Just do not forget to contribute it back that everybody 
benefit :)

Original comment by alexande...@gmail.com on 21 Jan 2011 at 9:16

GoogleCodeExporter commented 9 years ago
2 #11 actually profiler shows figures close to 100MB for the total GCed-size 
during yaml loading. MemoryStressTest shows ~12MB because there are some GC 
going on during loading. If we give 1024m to the java running MemoryStressTest 
numbers maybe ~100MB.

Original comment by alexande...@gmail.com on 22 Jan 2011 at 12:11

GoogleCodeExporter commented 9 years ago
Hi, with 1.8-SNAPSHOT I still have the problem of out of memory.

I modified StreamReader.java 1.8-SNAPSHOT and it works fine and now I have a 
memory consumption similar to deserialization Xstream.

Even if you have seen that the change does not solve the problem, I had no 
problems with "out of memory" so far

Original comment by antonio....@gmail.com on 27 Jan 2011 at 11:53

Attachments:

GoogleCodeExporter commented 9 years ago
Have you tried latest version? StreamReader has been modified there to use less 
memory I believe and works a bit faster. I think even snapshot has been 
uploaded already. But to be sure just pull latest master and try it.

I do not know what you mean by "memory consumption similar to deserialization 
Xstream", and I do not have any figures for xstream, but I think now 
StreamReader is not the main memory consumer :)

Original comment by alexande...@gmail.com on 27 Jan 2011 at 12:09

GoogleCodeExporter commented 9 years ago
Ok,  I download again 1.8-SNAPSHOT with maven (I first delete the maven 
repository for org.snakeyaml) and I'll tell you how it goes

Thanks 

Original comment by antonio....@gmail.com on 27 Jan 2011 at 12:27

GoogleCodeExporter commented 9 years ago
for comment#15

I did not quite catch you. When I try to use the attached file it does not even 
compile with the latest source. How did you manage to build the JAR to test ?
How do you provide the input ? As String or as InputStream ?
The files you give here do not help because:
1) there is no code which show how you call SnakeYAML
2) your YAML contains a lot of global tags which prevents us from loading the 
document

Why do not you create a test case ? Either use a remote clone or attach here a 
patch.

Without your commitment we can hardly do anything for the issue.

When looking into the file I do not understand why it should consume less 
resources. Can you please try to explain what you achieve with the changes ? 
What is better ? Why do you think the memory consumption is improved ?

Original comment by py4fun@gmail.com on 27 Jan 2011 at 2:38

GoogleCodeExporter commented 9 years ago
Sorry, I've just performed pull and I notice that you changed the code,  now it 
works also for a big file (2 MB)

Thanks

Original comment by antonio....@gmail.com on 27 Jan 2011 at 3:15

GoogleCodeExporter commented 9 years ago
I still did not get how it works when you build it from source but does not 
work when you use the latest SNAPSHOT. The latest SNAPSHOT (1.8-SNAPSHOT) 
always relates to the source code in the master Mercurial repository.

May we close the issue ?

(please do not forget to remove the remote repository if you do not need it 
anymore: 
your clone -> administer -> advanced -> delete repository)

Original comment by py4fun@gmail.com on 27 Jan 2011 at 5:03

GoogleCodeExporter commented 9 years ago
maybe some proxy caching thing... who knows...

Original comment by alexande...@gmail.com on 27 Jan 2011 at 6:40

GoogleCodeExporter commented 9 years ago
Due to changes made for issues 79 and 101 SnakeYAML consumes less resources.
It will be delivered in version 1.8

Original comment by py4fun@gmail.com on 31 Jan 2011 at 9:36

GoogleCodeExporter commented 9 years ago
long attachments were deleted

Original comment by aso...@gmail.com on 1 Mar 2011 at 11:48