gaob13 / kryo

Automatically exported from code.google.com/p/kryo
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Provide sun.misc.Unsafe-based implementation of FieldSerializer #75

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
It could be interesting to provide sun.misc.Unsafe based implementation of 
FieldSerializer. It would be able to read object's memory, write into memory 
buffers directly using readXYZ/writeXYZ methods of the Unsafe class.

One potential advantage could be speed.

More over, and probably more important, it could be seen as an alternative to 
using ASM, because some people may want to reduce external dependencies to a 
minimum.

The implementation should pretty easy and straight forward. Many frameworks use 
a similar trick already (protostuff, etc).

Original issue reported on code.google.com by romixlev on 28 Jun 2012 at 6:37

GoogleCodeExporter commented 8 years ago
BTW, similar ideas with some benchmarks could be found on this blog (also see 
comments):

http://mechanical-sympathy.blogspot.de/2012/07/native-cc-like-performance-for-ja
va.html

This guy knows a thing or two about optimizations - he is one of the main 
developers of LMAX Disruptor framework. 

Original comment by romixlev on 6 Jul 2012 at 11:58

GoogleCodeExporter commented 8 years ago
Very intersting!

Original comment by serverpe...@gmail.com on 8 Jul 2012 at 11:13

GoogleCodeExporter commented 8 years ago
I have implemented a FieldSerializer based on Unsafe. It uses sun.misc.Unsafe 
to read/write fields of an object. The code is inside the FieldSerializer 
class, just like it is done for ASM. I will provide a patch soon. 
The implementation is very small and simple. Performance-wise it seems to be 
comparable with ASM-generated one or even slightly better.

Since we'd have ASM and Unsafe based implementations, it could be a good idea 
to move both implementations (i.e. related field serialization classes) out of 
FieldSerializer and make them optional, so that they are loaded only when they 
are enabled. This would allow to use Kryo without ASM or on non-Sun JVMs, for 
example. 
May be these implementations should even become maven (sub)projects that 
produce independent JARs that can be used together with Kryo, if required by 
the user. 

Original comment by romixlev on 1 Aug 2012 at 5:43

GoogleCodeExporter commented 8 years ago
Nice! I'm fine with separating ASM/Unsafe from FieldSerializer. I don't think 
they need to be separate JARs or projects. They can be optional even with the 
classes in the Kryo JAR.

It would be nice to make Unsafe faster than ASM. If it isn't, I'm not sure it 
is worthwhile, as ASM is compatible with any JVM.

Major performance improvements would come from writing the values to bytes with 
the Unsafe methods. This would mean an alternate versions of Output/Input. We'd 
also need to handle byte order (or just make it clear in javadocs that is an 
issue :p).

Original comment by nathan.s...@gmail.com on 2 Aug 2012 at 4:17

GoogleCodeExporter commented 8 years ago
>It would be nice to make Unsafe faster than ASM. If it isn't, I'm not sure it 
is 
>worthwhile, as ASM is compatible with any JVM.

I agree. I think it will be faster, but I need to benchmark a bit more (until 
it becomes true :-).

>Major performance improvements would come from writing the values to bytes 
with the 
>Unsafe methods. This would mean an alternate versions of Output/Input. We'd 
also 
>need to handle byte order (or just make it clear in javadocs that is an issue 
:p).

Yes. This is exactly what is missing yet. I started working on it, but haven't 
finished it yet. 

Original comment by romixlev on 2 Aug 2012 at 4:24

GoogleCodeExporter commented 8 years ago
While working on Unsafe-based input/output streams I noticed the following 
problem:

Currently Kryo uses Output and Input when doing serialization/deserialization. 
But these two classes are really concrete implementations, instead of being 
interfaces. 

It makes it difficult to use Kryo with an alternative implementation of streams 
unless it is derived from Input/Output. But deriving from these classes is not 
always a best option. For example, for Unsafe-based streams, we need to use 
direct buffers instead of byte arrays, because we need to be sure that they are 
not moved by garbage collector. 

What should we do? Introduce 2 interfaces classes from streams let Input/Output 
and Unsafe-based stream classes implement them? 

Original comment by romixlev on 3 Aug 2012 at 7:10

GoogleCodeExporter commented 8 years ago
Actually, it is possible to extend the existing Input and Output classes, if 
using a set of Unsafe methods which can work with object as a first parameter 
and therefore support byte arrays which are relocated using GC. Having two 
interfaces would be still a safer approach and more future proof.

With this remark, I've implemented a first naive version of IO streams using 
Unsafe. I haven't done extensive benchmarking yet. Initial tests show that it 
is faster than the existing approach by at least 5-10%. I'll do more tests and 
tweaking to see if we can get more out of this approach.

Original comment by romixlev on 7 Aug 2012 at 11:04

GoogleCodeExporter commented 8 years ago
Is there a performance benefit to using an interface for Input/Output? If 
interfaces were introduced, what would they be named?

Original comment by nathan.s...@gmail.com on 18 Aug 2012 at 2:57

GoogleCodeExporter commented 8 years ago
Nate, 

> Is there a performance benefit to using an interface for Input/Output? 
Hmm. I don't think that using interface vs concrete class will have any 
performance impact. My idea was that using interfaces is cleaner as it allows 
for multiple alternative implementations of Input and Output streams. Examples 
of such implementations could be:
- the current implementation
- UnsafeInput and UnsafeOutput which still uses an on-heap byte array, but 
Unsafe to manipulate it
- OffHeapUnsafeInput and OffHeapUnsafeOutput, which would use offheap memory 
and Unsafe

As you can imagine, alternatives (2) and especially (3) do not need to derive 
from current implementation necessarily, but should derive from the common 
interface/abstract class.

> If interfaces were introduced, what would they be named?

I think the easiest thing could be to reuse the names Input and Output for the 
common interfaces/abstract classes and then derive current implementation from 
it and name ByteArrayInput/ByteArrayOutput or SimpleInput/SimpleOutput. 

Alternatively, new interfaces could get new names, e.g. DataInput and 
DataOutput, current implementation adds "implements DataXXX" to its source 
code. Kryo sources using streams are updated almost everywhere to take 
DataInput and DataOutput as parameters. The advantage of this approach is that 
user code remains unchanged in most cases.

Original comment by romixlev on 21 Aug 2012 at 9:34

GoogleCodeExporter commented 8 years ago
Interfaces are a bit cleaner, but I hesitate to introduce an abstraction layer 
until we are sure it is needed. Have we seen that alternate Input/Output 
implementations are faster? Unsafe is slightly faster so far, at the cost of 
larger output. Is it enough of an improvement to warrant including an 
implementation? I'm not sure off heap will be faster.

Original comment by nathan.s...@gmail.com on 21 Aug 2012 at 6:03

GoogleCodeExporter commented 8 years ago
> Interfaces are a bit cleaner, but I hesitate to introduce an abstraction 
layer until 
> we are sure it is needed.

I'm not quite sure that introducing such an interface is a big deal. Cannot 
quite follow why you're against it.

After all, JDK library (and almost all libs) uses this approach all over the 
place. The only goal is to separate interface/specification from 
implementation. Performance overhead is almost zero, if present at all. And 
client code is not dependent on the implementation any more, which makes it 
more future proof in case Kryo implementation changes later.

> Have we seen that alternate Input/Output implementations are faster? 

I'm working on tests and benchmarks that should provide more detailed 
information.

BTW, one thing I can see is that my quickser library, which is very similar to 
Kryo in many ways in term of implementation, is still faster than Kryo with 
Unsafe-based approach. My feeling (really just a feeling, which I cannot proof) 
is that this is due to the fact that a call-stack for serializing an object 
graph is leaner in case of quickser. Probably JIT can better optimize in this 
case.

Original comment by romixlev on 21 Aug 2012 at 6:53

GoogleCodeExporter commented 8 years ago
Finally, we are getting somewhere! I optimized writing/reading of arrays when 
using my UnsafeOutput and UnsafeInput. With this improvement, I see that 
serialize/deserialize roundtrips are 5-20 times faster (!!!) than using 
standard DefaultArraySerializers and Input/Output. The bigger is an array, the 
bigger is the win when using UnsafeOutput and UnsafeInput. 

Also on non-array-heavy objects I see performance improvements when using 
unsafe streams. But there they are not so spectacular.

Of course, with Unsafe streams, no variable length encoding is performed for 
ints, longs, etc. The resulting representation is therefore (much) bigger. 

May be a post-compression can be applied afterwards to make the representation 
smaller. I'll try to create a test for it to see how much it will affect 
performance.

Original comment by romixlev on 24 Aug 2012 at 4:00

GoogleCodeExporter commented 8 years ago
> I'm not quite sure that introducing such an interface is a big deal. Cannot 
quite
> follow why you're against it.

I'm not against it. Interfaces can be introduced everywhere, but generally 
shouldn't be until they are needed. My performance question wasn't about 
interfaced method calls, but whether alternate implementations were worthwhile. 
It sounds like they could be for situations where speed is preferred over size.

> BTW, one thing I can see is that my quickser library, which is very similar 
to Kryo
> in many ways in term of implementation, is still faster than Kryo with 
Unsafe-based approach.

From a quick look, quickser seems to have fewer method calls, map lookups, etc.

Interesting to hear your early results. 
DefaultArraySerializers.IntArraySerializer does variable length encoding. Since 
Unsafe gives up size for speed, it would be better to compare Unsafe to a 4 
byte int encoding.

It would be convenient to have an option to enforce byte order.

Original comment by nathan.s...@gmail.com on 24 Aug 2012 at 8:19

GoogleCodeExporter commented 8 years ago
> Interesting to hear your early results. 
DefaultArraySerializers.IntArraySerializer 
> does variable length encoding. Since Unsafe gives up size for speed, it would 
be 
> better to compare Unsafe to a 4 byte int encoding.

I actually tried it. I create classes FastInput and FastOutput, derived from 
Input and Output. There I overloaded write/read methods so that all native 
types would use their native size without any variable length encoding. Tests 
have shown that it makes the situation even worse, i.e. these "Fast" classes 
are not fast at all. They are slower than your original Input and Output 
classes that do variable length encoding. The reason for this is that we can 
only write ints/longs byte-by-byte if Unsafe (or ByteBuffer) is not used.

> It would be convenient to have an option to enforce byte order.
I agree. Where should we put such an option? I guess as a method in 
Input/Output?

Another question:
I think I would be convenient to have an option to tell if variable length 
encoding should be enabled or disabled for a given Input/Output stream. This 
way one could e.g. enable variable length encoding even for Unsafe-based 
streams. It would result in much smaller size of produced serialized 
representation (probably the same size as produced now by the vanilla Kryo), 
but would be of course slower. But probably still faster than current 
Input/Output implementation. What do you think?

Original comment by romixlev on 24 Aug 2012 at 8:32

GoogleCodeExporter commented 8 years ago
To make my last point about an option for indicating variable length encoding 
for a given stream:
- with my current implementation and refactoring, the existing boolean flag 
passed when writing/reading ints and longs for indicating that variable length 
encoding should be used is considered to be a hint. It is a stream that decides 
to take it into account or to ignore it. This way it becomes possible to 
enable/disable variable length encoding at the stream level. The rest of the 
Kryo source code is not affected at all and still assumes that variable length 
encoding is used.

Original comment by romixlev on 24 Aug 2012 at 8:39

GoogleCodeExporter commented 8 years ago
BTW, one more interesting observation:

Unsafe-based FieldSerializer outperforms ASM-based one default serializer for 
all fields that are non-public, because ASM cannot handle them efficiently and 
FieldSerializer falls back to using reflection. At the same time, an 
Unsafe-based serialize doesn't care about such differences, as it treats all 
fields simply as a content of memory cells.

And as we know, a lot of classes (if not a majority) written in OOP style 
define their fields as private or at least protected. It means that 
Unsafe-based serializer wins in such situations.

Original comment by romixlev on 24 Aug 2012 at 8:50

GoogleCodeExporter commented 8 years ago
Good point on Unsafe use with private fields. Possibly ReflectASM could be 
augmented to use Unsafe. It seems like we should automatically use Unsafe, if 
available, where it has no drawbacks (larger size).

I would expect Output#writeInt(int) (4 bytes) to be faster than 
Output#writeInt(int, boolean) (variable length) because it has no branching, 
even though it more often writes more bytes.

I'm ok with a stream being configured to write bytes differently than the 
stream methods indicate. The default streams (ie the current ones) probably 
don't need a setting to use variable length encodings, but the Unsafe streams 
might.

Some configuration can't be at the stream level. Eg, primitive arrays use 
serializers, so serializing arrays via Unsafe requires different serializers 
for those types. Do we require registering these serializers manually? If so, 
at least users would understand what is going on. I guess it would also be the 
user's problem to fallback to non-Unsafe if Unsafe is unavailable. We probably 
shouldn't waste much resources on fallback though, as it is relatively easy to 
avoid running where Unsafe is unavailable.

Original comment by nathan.s...@gmail.com on 25 Aug 2012 at 5:22

GoogleCodeExporter commented 8 years ago
>Good point on Unsafe use with private fields. Possibly ReflectASM could be 
augmented 
>to use Unsafe. It seems like we should automatically use Unsafe, if available, 

This is how I did it now. FieldSerializer checks if Unsafe is available and 
uses it. If not, it fallbacks to the current ASM-based approach.

>where  it has no drawbacks (larger size).
Just to make clear. I use Unsafe for two different purposes now:

1) in FieldSerializer for reading/writing fields. In this setting it does not 
affect size in any way. It just speeds up access to fields of an object. 
Performance wins are moderate for public classes with public fields. It gets 
better for classes with private fields, as I explained before.

2) I use Unsafe in UnsafeInput and UnsafeOutput. It allows for writing 
ints/longs/shorts very effectively with one machine instruction. Plus it 
supports bulk reading/writing of arrays of primitive types.

Both of these features can be used independently. I.e. you can use ASM-based 
FieldSerializer and Unsafe streams, or Unsafe-based FieldSerializer and usual 
streams or use Unsafe for everything. 

>I would expect Output#writeInt(int) (4 bytes) to be faster than 
Output#writeInt(int, 
>boolean) (variable length) because it has no branching, even though it more 
often 
>writes more bytes.

Mee too. But as I explained, testing and benchmarking has shown that in fact it 
is slower for int/long arrays.

>I'm ok with a stream being configured to write bytes differently than the 
stream 
>methods indicate. The default streams (ie the current ones) probably don't 
need a 
>setting to use variable length encodings, but the Unsafe streams might.

Hmm. I'd prefer to introduce it for all streams, just to be more uniform. But 
we can start with UnsafeInput and UnsafeOutput and then see if we also extend 
Input/Output to support it.

> Some configuration can't be at the stream level. Eg, primitive arrays use 
> serializers, so serializing arrays via Unsafe requires different serializers 
for 
> those types. Do we require registering these serializers manually? If so, at 
least 
> users would understand what is going on. I guess it would also be the user's 
> problem to fallback to non-Unsafe if Unsafe is unavailable. We probably 
shouldn't 
> waste much resources on fallback though, as it is relatively easy to avoid 
running 
> where Unsafe is unavailable.

No, there is no need to change anything in the user-space for array 
serializers. Instead I refactored current serializers for primitive arrays. 
What I did is the following:
I added to streams new APIs for bulk operations on arrays of all primitive 
types, not only for bytes as it was the case before. The loop that writes/reads 
elements one-by-one into a stream is moved from the corresponding write/read 
methos of e.g. IntArraySerializer into a stream. We have now writeInts/readInts 
methods there. So, IntArraySerializer just invokes it. It is similar to the 
writeBytes approach used by ByteArraySerializer. The usual Input/Output do not 
provide any efficient support for these bulk operations. They still use 
one-by-one element approach, i.e. work exactly as before - no change in 
semantics. But Unsafe streams implement these bulk operations very efficiently, 
becase Unsafe has very good support for it. 

Original comment by romixlev on 25 Aug 2012 at 7:56

GoogleCodeExporter commented 8 years ago
> FieldSerializer checks if Unsafe is available and uses it. If not, it 
fallbacks to 
> the current ASM-based approach.

It would be nice if this was done in ReflectASM. This way projects unrelated to 
Kryo that just want faster reflection can also benefit.

> I added to streams new APIs for bulk operations on arrays of all primitive 
types, 
> not only for bytes as it was the case before.

Ah, nice. Since the serializer is just invoking the stream method, a different 
serializer can be plugged in if different behavior is desired.

> I'd prefer to introduce it for all streams, just to be more uniform.

I can't think of a use case for the setting with the default streams. The 
setting is really for Unsafe streams, as there it provides a meaningful 
speed/size tradeoff. Eg, users might want to use Unsafe for all ints, or maybe 
they want variable length ints and only writeInt(int) and writeInts(int[]) 
should use Unsafe.

It sounds like this will be a great addition to the library! :)

Original comment by nathan.s...@gmail.com on 25 Aug 2012 at 8:51

GoogleCodeExporter commented 8 years ago
> It would be nice if this was done in ReflectASM. This way projects unrelated 
to 
> Kryo that just want faster reflection can also benefit.

Once I provide a patch, you can check if you want to incorporate it into 
ReflectASM. In fact, this Unsafe-based access to fields is a very small piece 
of code. It should be pretty easy to port it into any other library. 

Another potential candidate to be incorporated into other libs is 
implementation of Unsafe-streams. I could imagine that libs like Quickser, 
protostuff and the like 
could benefit from it.

BTW, speaking of ReflectASM, my impression at some places there was that the 
ASM-based code-generation looked very complex, i.e. difficult to understand by 
a human. How have you produced it? Is it hand-written, or have you taken a 
class with the code you needed and then used ASM option to produce a piece of 
Java-code that generates the same class/bytecode? If it was done this way, 
couldn't it be automated somehow so that it is easy to see the actual 
transformation and not the tons of ASM API invocations? 

Original comment by romixlev on 25 Aug 2012 at 9:15

GoogleCodeExporter commented 8 years ago
ASM has a tool for outputting ASM code for a Java class. This was used at first 
on a hand written Java class that almost does what ReflectASM needs to do for a 
single method/field. The output was used as a basic skeleton and additional 
hand written ASM code was added. It isn't possible to automatically generate. 
Readability has recently improved with contributions from the user named 
"serverperformance".

Original comment by nathan.s...@gmail.com on 25 Aug 2012 at 10:04

GoogleCodeExporter commented 8 years ago
I implemented a few more optimizations:

- more efficient varInt implementation for Unsafe case. It tries to minimize 
the number of memory writes. It first tries to collect as many bytes as 
possible into a 4-byte integer value and then writes it into memory. This seems 
to cut times by 2 in many cases, compared to current streams. But it is still 
2-3 times slower than unsafe-based streams, which do not use varInt encoding 
and write 4-bytes using Unsafe directly. 

- an interesting optimization in Unsafe-based FieldSerializer for classes that 
contain a lot of fields of primitive types, i.e. data classes. I did the 
following observation: The layout that JVM internally uses for fields in the 
class is done in such a way, that all fields of primitive types are grouped 
together (plus some padding and alignment). It means that one can see the set 
of fields of primitive types as a single memory area! Therefore, there is no 
need to write/read them one-by-one. The whole area can be processed in one bulk 
operation. This leads to a very good performance for such classes. At least 2 
times or more better than without it. It works best if we use Unsafe streams 
together with this serializer.

There is also a problem:

I was thinking about adding an option for selecting a byte order for 
Unsafe-streams. It would make them (a bit) slower, but could work. But there is 
a catch: bulk operations (see the optimization above and previous message about 
array processing optimization) can only be done effectively if native 
byte-order is used. What is the best in this situation then? Should we state 
that Unsafe streams always use native byte order and users should be simply 
aware of it. Or should we add option to set the byte-order, but warn that it 
will kill many of benefits delivered by Unsafe-streams?

Original comment by romixlev on 28 Aug 2012 at 4:33

GoogleCodeExporter commented 8 years ago
I added a few more Input/Output streams that support serialization into/from 
ByteBuffers, including off-heap buffers and then Unsafe-based streams on top of 
it. As a result, one can now very fast serialize directly into JVM's off-heap 
memory. There is virtually no difference compared to serialization using Unsafe 
into byte[].

I think the whole thing is almost feature complete now. I'm polishing the 
JavaDocs and adding more unit tests. Hopefully I'll be able to provide a patch 
rather soon, may be next week.

Original comment by romixlev on 31 Aug 2012 at 9:00

GoogleCodeExporter commented 8 years ago
> - more efficient varInt implementation for Unsafe case. It tries to minimize 
the 
> number of memory writes. It first tries to collect as many bytes as possible 
into 
> a 4-byte integer value and then writes it into memory. This seems to cut 
times by
> 2 in many cases, compared to current streams. But it is still 2-3 times 
slower 
> than unsafe-based streams, which do not use varInt encoding and write 4-bytes 
> using Unsafe directly.

Interesting. Is collecting an 8 byte long faster?

> an interesting optimization in Unsafe-based FieldSerializer for classes that 
> contain a lot of fields of primitive types, i.e. data classes. I did the 
following 
> observation: The layout that JVM internally uses for fields in the class is 
done in 
> such a way, that all fields of primitive types are grouped together (plus 
some 
> padding and alignment).

Also interesting. Did you implement this? We could change the sorting of fields 
in FieldSerializer to be primitives first, alpha second. This would break 
serialized data, but might be worthwhile (says the guy without terabytes of 
data).

> I was thinking about adding an option for selecting a byte order for Unsafe-
streams.

I think I read it was a 20% performance hit when byte order needed adjustment 
(eg Long.reverseBytes). Is Unsafe still work it in that case? If not, we don't 
need the option and users can avoid Unsafe when byte order is important. 
Otherwise we can either have an option. We could have boolean checks 
everywhere. We could have two sets of streams, where the necessary set is 
chosen based on native byte order, and we could allow the byte order to use to 
be specified so people could optimize for a particular native order. The byte 
order feature could always be added at a later time.

> I think the whole thing is almost feature complete now. I'm polishing the 
JavaDocs 
> and adding more unit tests. Hopefully I'll be able to provide a patch rather 
soon, 
> may be next week. 

Looking forward to seeing your stuff! :) But take your time, I've been super 
busy and haven't gotten to play with Kryo for quite a while. It isn't relenting 
any time soon either. :( Must actually finish my app if I want to keep eating!

Original comment by nathan.s...@gmail.com on 31 Aug 2012 at 9:20

GoogleCodeExporter commented 8 years ago
> - more efficient varInt implementation for Unsafe case. It tries to minimize 
the 
>> number of memory writes. It first tries to collect as many bytes as possible 
into 
>> a 4-byte integer value and then writes it into memory. This seems to cut 
times by
>> 2 in many cases, compared to current streams. But it is still 2-3 times 
slower 
>> than unsafe-based streams, which do not use varInt encoding and write 
4-bytes 
>> using Unsafe directly.

> Interesting. Is collecting an 8 byte long faster?

I'll check it.

> such a way, that all fields of primitive types are grouped together (plus 
some 
> padding and alignment).

> Also interesting. Did you implement this? 

Yes. It is implemented. See below.

> We could change the sorting of fields in 
> FieldSerializer to be primitives first, alpha second. This would break 
serialized > data, but might be worthwhile (says the guy without terabytes of 
data).

The layout of fields in memory is already "primitives first, everything else 
afterwards", at least in Oracle's JVM. But my implementation (which is done in 
the FieldSerializer) is not rely on Oracle that much. It uses Unsafe to get the 
offset of each field in the object's memory. Then it sorts by offsets and tries 
to see the longest regions with adjacent primitive fields. Such regions are 
then written/read using bulk operations instead of processing them 
field-by-field. 

Regarding the byte-order, the issue is that using non-native one is slower (I 
haven't measured by how much, but may be 20% looks like a good guess) and most 
important, it does not effectively support bulk operations using Unsafe. I'll 
probably try to add byte-order options to ByteBuffer-based streams. But all 
Unsafe-based streams would support only native byte-order. And after all, it is 
not the most exciting feature, as you say. If there will be any interest, it 
could be adjusted later.

> Looking forward to seeing your stuff! :) But take your time, I've been super 
busy 
> and haven't gotten to play with Kryo for quite a while. It isn't relenting 
any 
> time soon either. :( Must actually finish my app if I want to keep eating!

I see your point ;-) I'm also not in a hurry :-) 

Original comment by romixlev on 31 Aug 2012 at 10:48

GoogleCodeExporter commented 8 years ago
> The layout of fields in memory is already "primitives first, everything else 
> afterwards", at least in Oracle's JVM. But my implementation (which is done 
in the 
> FieldSerializer) is not rely on Oracle that much. It uses Unsafe to get the 
offset 
> of each field in the object's memory. Then it sorts by offsets and tries to 
see the 
> longest regions with adjacent primitive fields. Such regions are then 
written/read 
> using bulk operations instead of processing them field-by-field.

This sounds ok as long as the order of the fields is identical when 
deserialized. I wonder how consistent this is across even Oracle JVMs.

> I'll probably try to add byte-order options to ByteBuffer-based streams.

I'd like to make sure ByteBuffer streams are faster than buffering with a 
byte[] as we do now.

Original comment by nathan.s...@gmail.com on 31 Aug 2012 at 6:22

GoogleCodeExporter commented 8 years ago
>This sounds ok as long as the order of the fields is identical when 
deserialized. I 
>wonder how consistent this is across even Oracle JVMs.

I think it is consistent among Oracle JVMs. This ordering has to do with 
alignment issues. I'm not sure though if this is required by standard.

Original comment by romixlev on 31 Aug 2012 at 6:35

GoogleCodeExporter commented 8 years ago
I doubt it is in the spec. The ordering of fields on classes definitely isn't.

Original comment by nathan.s...@gmail.com on 31 Aug 2012 at 6:57

GoogleCodeExporter commented 8 years ago
Does it help to solve the ***ing "no-args" constructor issue?

Original comment by valtih1...@gmail.com on 5 Sep 2012 at 9:38

GoogleCodeExporter commented 8 years ago
Hmm. I haven't looked into it. But in principle Unsafe-based approach could 
help, as java.misc.Unsafe has a method to create instances without invoking any 
constructors. It is called allocateInstance. 

Original comment by romixlev on 5 Sep 2012 at 11:07

GoogleCodeExporter commented 8 years ago
Objenesis already supplies a way to construct using Unsafe. Objenesis doesn't 
really do very much though, might be nice to fork it and clean up that part of 
the API a bit.

Original comment by nathan.s...@gmail.com on 5 Sep 2012 at 11:02

GoogleCodeExporter commented 8 years ago
Hi Nate,

I finally found time to clean-up a bit and create a patch against trunk.
Please find the patch attached to this message. It includes besides changes 
also a few additional unit tests for new stream classes. I can build the trunk 
with these changes and all tests are still green.

Please let me know what you think about it and what needs to be 
changed/improved before you would consider including it into trunk.

What is included into this patch:

- new stream classes: UnsafeInput, UnsafeOutput, UnsafeMemoryInput, 
UnsafeMemoryOutput, ByteBufferInput, ByteBufferOutout, FastInput, FastOutput. 
Some of these classes are based on Unsafe and are much faster in many cases, 
especially if arrays are being serialized. UnsafeMemory* classes allow for 
(de)serialization directly to/from off-heap memory.

- All stream classes now support bulk (de)serialization of arrays of primitive 
types. Array serializers make use of it.

- There is a change in handling variable length encoded ints by the stream 
classes. The old method writeInt(value, optimizePositiveFlag) is used by 
serializers, but considered to be a hint for stream classes. Some of the stream 
classes may ignore it and use a usual fixed size serialization of integers for 
performance reasons. But since some integers (e.g. class IDs, reference ids, 
etc) are used very frequently and are usually small, it still makes sense to 
use variable length encoding for them to produce a much smaller representation. 
To support it, new methods writeVarInt/readVarInt are introduced. If they are 
used, any stream is supposed to use a variable length encoding.

- Related to a previous paragraph, I found that in many cases Kryo was using 
writeByte(), where actually writeInt(value, true) was meant. This is fixed now.

- FieldSerializer is refactored a bit. AsmCacheFields and UnsafeCacheFields are 
now defined in their own files. FieldSerializer tries to use Unsafe by default 
and falls back to ASM-based backend if Unsafe cannot be used.

- UnsafeCacheField serializers are using Unsafe to directly read/write fields 
of an objects. It is at least as fast as using ASM-based approach, plus it has 
no problems with private/protected fields and can serialize them at full-speed 
without any problems. 

- FieldSerialzer now also support copying of transient fields. Current 
semantics is controlled by means of two flags, which are hard-coded now and 
cannot be changed via public API. If there is a wish to change it later, we 
need to access setters/getters for them. Current implementation includes 
transients when copying, but ignores them when serializing.

- Overall, I get fastest results if FieldSerialzer in Unsafe mode is used with 
Unsafe streams. 

I hope I managed to cover all major aspects of the patch in this description. 
Let me know if something is missing and needs to be explained.

Cheers,
  Leo (which is a pseudonym, as you can realize by looking at the @author annotation of some source files. I often use it for mailing lists and open-source projects :-)

Original comment by romixlev on 12 Sep 2012 at 6:37

Attachments:

GoogleCodeExporter commented 8 years ago
Great! It will be some time before I can look at it. Long story short, my dog 
was killed by the negligence of the people watching her while I was gone, I 
packed all my shit and drove away. So I'm on an impromptu road trip without 
much of a goal.

Original comment by nathan.s...@gmail.com on 12 Sep 2012 at 10:12

GoogleCodeExporter commented 8 years ago
> Long story short, my dog was killed by the negligence of the people watching 
her 
> while I was gone, I packed all my shit and drove away. So I'm on an impromptu 
road 
> trip without much of a goal.

Oh. My condolences. I'm very sorry for your dog! It is really very difficult to 
get over it and it will certainly take some time for you to recover after this. 

> It will be some time before I can look at it. 

No hurry. Take your time.

Original comment by romixlev on 13 Sep 2012 at 7:49

GoogleCodeExporter commented 8 years ago
Hi Nate,

This is a small ping. I see that you are starting to answer mails on the 
mailing list and wanted to check the status of my patch. Have you had any time 
already to review it or to get an impression of it?

Original comment by romixlev on 20 Sep 2012 at 5:32

GoogleCodeExporter commented 8 years ago
Hi,

BTW, there is a blog from GridGain where they claim to have a much better 
serialization:

http://gridgain.blogspot.de/2012/12/java-serialization-good-fast-and-faster.html

I think that their presented tests are flawed, because they (re)create too many 
streams inside loops, etc. So, I tried to play with it to check if Kryo can 
perform better on those tests. First I tried to move as much as possible 
outside of loops. I also pre-registered all required classes. All this resulted 
in a small performance improvement, but far away from their results yet. 

Then I tried to use my Unsafe-based classes instead of standard Input/Output 
from Kryo. As a result, I got times below 1 ms per iteration! In particular, my 
optimized array handling really pays off here. 

@Nate: This is a usual reminder about reviewing this patch. Any chances that 
you'll have a time for it soon?

Original comment by romixlev on 11 Dec 2012 at 12:08

GoogleCodeExporter commented 8 years ago
Changes implementing this feature were committed in r362

Original comment by romixlev on 5 Mar 2013 at 1:07

GoogleCodeExporter commented 8 years ago
:D Thanks Leo!

Original comment by nathan.s...@gmail.com on 5 Mar 2013 at 5:43

GoogleCodeExporter commented 8 years ago
In fact, it most likely also fixes issues 83 and 92, but I'd like someone to 
confirm it.

Also, some feedback on the new Unsafe-based features would be interesting. E.g. 
whether it really improves performance, breaks any existing code, etc.

Original comment by romixlev on 5 Mar 2013 at 5:47