Closed GoogleCodeExporter closed 9 years ago
Added performance tests in r388
On my machine (a dual-processor 64 bit ubuntu with 8GB RAM), Gson was able to
serialize a collection of 1.4 million objects. The deserializer threw a stack
overflow error in the parser beyond a collection of 87,000 objects.
Original comment by inder123
on 3 Mar 2009 at 10:23
Original comment by inder123
on 3 Mar 2009 at 10:24
The parser uses a production to match a collection, and that is implemented by
JavaCC
using recursion. If someone proposes a more efficient production, we will be
happy to
incorporate it.
Original comment by inder123
on 3 Mar 2009 at 10:39
One of the stated goals of GSON is to "Support arbitrarily complex objects
(with deep
inheritance hierarchies and extensive use of generic types)".
I guess you should change it to 'arbitrarily complex objects (except
collections with
large numbers of items)'.
I use a byte array to store the bytes of a file. I can serialize this array w/
GSON
just fine, but somehow this object is too complex to be parsed? A byte array?
Really?
Original comment by kenot...@gmail.com
on 13 Jul 2009 at 8:33
Can you give some more information on how large the byte array is?
Original comment by inder123
on 13 Jul 2009 at 8:38
It's a megabyte-ish file, so about a million items. The point though is that
it
shouldn't matter how many items are in it, it's perhaps the most basic data
type in all
of Java, and if GSON can't parse it, then perhaps GSON shouldn't be generating
it in
the first place.
Original comment by kenot...@gmail.com
on 13 Jul 2009 at 8:48
Well, that is not how parsers work unfortunately. They have to match a
production,
which in this case is for matching a String (I presume this is how you are
storing
bytearrays). A String may seem like a simple thing but it actually require a
fair
amount of escaping and unescaping character by character.
In an earlier version of Gson, we used a recursive production to match a String
as an
array of characters. This resulted in a StackOverflowError for 100KB strings.
In a
later version, I revised it to a production that does a single token match.
Last I
tested, Gson could handle strings of over 20MB in size.
Can you give us more details (may be post a code snippet) on how are the bytes
stored? Are you using a byte[] or are you using a String?
Original comment by inder123
on 13 Jul 2009 at 9:03
Wrote performance tests for byte array serialization and deserialization in
r430. Gson
failed at serializing arrays beyond 8MB, but for deserialization it failed for
arrays
as small as 32KB. So, seems like we have a real performance issue here that we
need
fixing.
Original comment by inder123
on 13 Jul 2009 at 10:29
Ok, I revised my tests to determine the limit somewhat more precisely. Gson
failed at
deserialization on byte arrays of size beyond 80k.
The primary reason is that Gson matches an Array as a production of Array
elements.
The Elements themselves are matched with a recursive production (and therein
lies the
problem):
private JsonArray JsonArray() :
{ JsonArray array = new JsonArray(); }
{
"[" [ Elements(array) ] "]"
{
array.reverse();
return array;
}
}
private void Elements(JsonArray array) :
{
JsonElement element;
}
{
element=JsonValue() [ "," Elements(array) ]
{ array.add(element); }
}
Any suggestions from anyone on how to improve these productions?
Original comment by inder123
on 13 Jul 2009 at 10:45
Your production appears to approach N^2 in memory usage, since you pass the
entire
parsed array into the recursive call. The top-level recursion gets a 0-item
JsonArray, the first inner recursion gets a 1-item array, and so on, up to N
items.
But those recursive calls will stay on the stack until the whole list is
parsed, so
you end up with 1+2+3+...+N items on the stack.
What if you built the JsonArray iteratively in the top-level rule instead, so
that
the single 'result' array could be updated without passing it into another
rule? I
haven't played with parser generators since ANTLR back in my compiler class, so
I'm
not well-versed enough in JavaCC semantics to write up the rule, but I'm
imagining a
grammar like this (in EBNF):
JsonArray : '[' JsonArrayElement ( ',' JsonArrayElement )+ ']'
I'm not sure how that translates to JavaCC, or how you'd get the next element
back,
but the point is to not pass a potentially huge data structure around in the
call
parameters and build it up iteratively instead.
Original comment by kenot...@gmail.com
on 14 Jul 2009 at 5:54
If you want to keep your current rules, you could perhaps instead make a single
globally-scoped JsonArray...so long as you only parse a single array at a time,
this
may work better with your current rule...globally-scoped variables are not
passed on
the stack like parameters are, so you should be able to scan arrays all the way
up to
the size of the JVM's memory.
Original comment by kenot...@gmail.com
on 15 Jul 2009 at 2:12
[deleted comment]
The limitation is much smaller on devices with limited resources, such as
Android phones. I'm running into this
problem trying to deserialize a string that's about 20k on Android 1.5.
Can anyone think of a constructive way to work around this issue for now?
Would writing a custom deserializer
help, or is the problem more low-level than that?
Original comment by mbur...@gmail.com
on 5 Aug 2009 at 5:11
Unfortunately the problem is more low-level since our JavaCC parser uses
recursive
productions. The parser gets invoked way before the deserializers are run.
We will try to address this in our next release, but it would greatly help if
others
can also provide alternate JavaCC productions for Javascript Arrays.
Original comment by inder123
on 5 Aug 2009 at 8:46
Recursive rules are fine when your input is guaranteed to be small, but the way
your
JsonArray rule is written, you pass the whole array into each recursive
call...that's just
plain unnecessary, and it quickly consumes all available memory for even
moderately-sized
arrays. Instead, rephrase your grammar as left-associative rather than
right-recursive.
Read values from the left end of the input and immediately add them to the
output array, then
consume the comma and iterate. Something like this:
private JsonArray JsonArray() :
{ JsonArray array = new JsonArray(); JsonElement elem = null; }
{
"[" ( elem=JsonValue() {array.add(elem);} [","] )* "]"
{ return array; }
}
There's no need for a second Elements() rule here at all, and since the
elements are read in
the proper order to begin with, there's no need to reverse the array either.
But also note that this is a rather lax grammar, since it doesn't validate that
the array is
proper JSON...the comma is optional only so that the last value is matched, but
because its
optional you could have bogus JSON like '[ 12 34, 56 ]' return the incorrect
array
(12,34,56). If that matters to you, I'll leave the solution to you.
I generated this to a java class and plugged in my replacement JsonArray
implementation into
the GSON source, then compiled and ran it against 234 KB of json text
representing a byte
array of about 64,000 items, and it successfully returned the array. It was
horribly slow,
about 3 minutes, but it did return eventually...not sure why it's so slow,
perhaps since I
have done the rule for you here, you can profile it and find out where all the
time is being
spent. You may have other recursive rules that can be similarly re-written to
reduce your
parser's memory footprint significantly.
Other than this problem (a major one IMHO), GSON has been great, definitely one
of the better
Java <-> JSON converters out there. Keep up the good work!
PS: when I convert my class to use a String to contain the byte array, it
serializes and
deserializes without problems, and extremely quickly too (no 3 minute wait). I
see you defer
this parsing to StringUnmarshaller. I suspect your parser is doing a lot of
backtracking,
spinning its wheels trying to find a matching production, but a profiler will
tell you for
sure.
Original comment by kenot...@gmail.com
on 14 Aug 2009 at 8:38
Fixed in r438
Thanks for the tip, kenotron. I was able to build on your productions to come
up with
something that enforces the proper JSON rules.
Original comment by inder123
on 22 Aug 2009 at 1:04
Lovely! Preliminary testing on r438 seems to work fine for me
Original comment by mbur...@gmail.com
on 22 Aug 2009 at 2:10
Original issue reported on code.google.com by
nit...@googlemail.com
on 22 Jan 2009 at 9:40