Closed GoogleCodeExporter closed 9 years ago
Hi,
Actually Kryo was using HashMap, or more preciesly, IdentityHashMap to
implement support for cyclic objects. Profiling has shown that in most cases,
which typically are not using huge data structures, using such a map introduces
too much overhead.
Therefore, Kryo changed the approach and went for simple linear lookup, which
is much faster in majority of case.
But for very deeply nested or big data structures like in your case, it is
certainly a win to use hash map.
So, it could be a good idea to be able to instruct a given Kryo instance which
approach should be used for lookups. By default it may use a linear search. But
if you know in advance that your data structures will be rather big, you can
tell it to use a map instead of ArrayList.
Does it sound like a plan?
Original comment by romixlev
on 7 Jun 2012 at 11:14
No related issue 62:
https://code.google.com/p/kryo/issues/detail?id=62
It could be configurable or pluggable, or it could start by using a list and
switch to a map after the list has X items. I'd prefer a solution that works
well for all cases, especially since if you are using references you probably
have a pretty large object graph.
I checked in r265, which goes back to using a map for writtenObjects.
readObjects can stay a list, so even if writtenObjects used a map it will be
better than before issue 62. I also tweaked some cuckoo map settings. The
number of eviction iterations was reduced (to keep puts fast) and the stash
size was increased (to keep from resizing due to the first change).
Leo, can you do the benchmarking you did in issue 62 and see how this performs
versus using a list? After a warm up, the writtenObjects map should be large
enough that it doesn't need to resize. Hopefully put performance is ok.
Original comment by nathan.s...@gmail.com
on 7 Jun 2012 at 7:37
OK. I'll do some benchmarking next week and report back the findings.
-Leo
Original comment by romixlev
on 7 Jun 2012 at 10:01
Great! I'll leave this open then.
Original comment by nathan.s...@gmail.com
on 7 Jun 2012 at 10:20
Here is the patch coderplay@gmail.com mentioned.
Original comment by flyicew...@gmail.com
on 8 Jun 2012 at 5:17
Attachments:
It would also be interesting to see how the benchmarking compares when using
HashMap. :D
Original comment by nathan.s...@gmail.com
on 8 Jun 2012 at 5:26
Hi Nathan,
Regarding the benchmark, pls see https://gist.github.com/2887921
Original comment by coderp...@gmail.com
on 11 Jun 2012 at 6:46
I tried out current version of Kryo with my tests (Akka-based actors send
messages to each other on the same node; each message has an id and a field
which is a Scala collection like List, Set, Map, etc; messages may have
references to the same objects in theory, but do not have them for real in this
test, i.e. it is a worst case as you have to check against all seen objects
every time).
The outcome is:
- With references enabled, 30-40% of the CPU time is spent in
IdentityObjectIntMap.get() method. Throughput of actors is 170000 msg/sec
- With references enabled, but using old-style ArrayList for writtenObjects,
almost no CPU time is spent on checking if a written object was seen before.
Throughput of actors is 200000 msg/sec
- With references disabled, almost no time is spent in ... Throughput of actors
is 260000 msg/sec
So, all in all, I'd say that it looks like the price paid for references
detection using a map is too high on small object graphs. For big graphs,
map-based approach is certainly a win.
Therefore, I'd suggest that the strategy for references detection should be
pluggable. It can be map by default, but it should be possible to replace it.
ArrayList based impl can be provided as part of Kryo for those, who know that
their object graphs will be small.
Original comment by romixlev
on 11 Jun 2012 at 11:32
This issue was closed by revision r277.
Original comment by nathan.s...@gmail.com
on 12 Jun 2012 at 4:22
I added Kryo#setReferenceMap(boolean), defaults to true, which uses a map. Set
to false to use a list. I decided to implement it with a setting because an
interface would show too many implementation details, eg about how reference
IDs must be generated. If someone actually has a use case where references need
to be customized beyond using a map or list (also note
Kryo#useReferences(Class)) then this can be refactored.
Original comment by nathan.s...@gmail.com
on 12 Jun 2012 at 4:24
Original issue reported on code.google.com by
coderp...@gmail.com
on 7 Jun 2012 at 9:40Attachments: