Closed ghost closed 11 years ago
From romixlev on July 06, 2012 04:58:44
BTW, similar ideas with some benchmarks could be found on this blog (also see comments):
http://mechanical-sympathy.blogspot.de/2012/07/native-cc-like-performance-for-java.html This guy knows a thing or two about optimizations - he is one of the main developers of LMAX Disruptor framework.
From serverpe...@gmail.com on July 08, 2012 16:13:48
Very intersting!
From romixlev on August 01, 2012 10:43:44
I have implemented a FieldSerializer based on Unsafe. It uses sun.misc.Unsafe to read/write fields of an object. The code is inside the FieldSerializer class, just like it is done for ASM. I will provide a patch soon. The implementation is very small and simple. Performance-wise it seems to be comparable with ASM-generated one or even slightly better.
Since we'd have ASM and Unsafe based implementations, it could be a good idea to move both implementations (i.e. related field serialization classes) out of FieldSerializer and make them optional, so that they are loaded only when they are enabled. This would allow to use Kryo without ASM or on non-Sun JVMs, for example. May be these implementations should even become maven (sub)projects that produce independent JARs that can be used together with Kryo, if required by the user.
From nathan.s...@gmail.com on August 02, 2012 09:17:57
Nice! I'm fine with separating ASM/Unsafe from FieldSerializer. I don't think they need to be separate JARs or projects. They can be optional even with the classes in the Kryo JAR.
It would be nice to make Unsafe faster than ASM. If it isn't, I'm not sure it is worthwhile, as ASM is compatible with any JVM.
Major performance improvements would come from writing the values to bytes with the Unsafe methods. This would mean an alternate versions of Output/Input. We'd also need to handle byte order (or just make it clear in javadocs that is an issue :p).
From romixlev on August 02, 2012 09:24:12
It would be nice to make Unsafe faster than ASM. If it isn't, I'm not sure it is worthwhile, as ASM is compatible with any JVM.
I agree. I think it will be faster, but I need to benchmark a bit more (until it becomes true :-).
Major performance improvements would come from writing the values to bytes with the Unsafe methods. This would mean an alternate versions of Output/Input. We'd also need to handle byte order (or just make it clear in javadocs that is an issue :p).
Yes. This is exactly what is missing yet. I started working on it, but haven't finished it yet.
From romixlev on August 03, 2012 12:10:18
While working on Unsafe-based input/output streams I noticed the following problem:
Currently Kryo uses Output and Input when doing serialization/deserialization. But these two classes are really concrete implementations, instead of being interfaces.
It makes it difficult to use Kryo with an alternative implementation of streams unless it is derived from Input/Output. But deriving from these classes is not always a best option. For example, for Unsafe-based streams, we need to use direct buffers instead of byte arrays, because we need to be sure that they are not moved by garbage collector.
What should we do? Introduce 2 interfaces classes from streams let Input/Output and Unsafe-based stream classes implement them?
From romixlev on August 07, 2012 04:04:31
Actually, it is possible to extend the existing Input and Output classes, if using a set of Unsafe methods which can work with object as a first parameter and therefore support byte arrays which are relocated using GC. Having two interfaces would be still a safer approach and more future proof.
With this remark, I've implemented a first naive version of IO streams using Unsafe. I haven't done extensive benchmarking yet. Initial tests show that it is faster than the existing approach by at least 5-10%. I'll do more tests and tweaking to see if we can get more out of this approach.
From nathan.s...@gmail.com on August 17, 2012 19:57:59
Is there a performance benefit to using an interface for Input/Output? If interfaces were introduced, what would they be named?
From romixlev on August 21, 2012 02:34:41
Nate,
Is there a performance benefit to using an interface for Input/Output? Hmm. I don't think that using interface vs concrete class will have any performance impact. My idea was that using interfaces is cleaner as it allows for multiple alternative implementations of Input and Output streams. Examples of such implementations could be:
- the current implementation
- UnsafeInput and UnsafeOutput which still uses an on-heap byte array, but Unsafe to manipulate it
- OffHeapUnsafeInput and OffHeapUnsafeOutput, which would use offheap memory and Unsafe
As you can imagine, alternatives (2) and especially (3) do not need to derive from current implementation necessarily, but should derive from the common interface/abstract class.
If interfaces were introduced, what would they be named?
I think the easiest thing could be to reuse the names Input and Output for the common interfaces/abstract classes and then derive current implementation from it and name ByteArrayInput/ByteArrayOutput or SimpleInput/SimpleOutput.
Alternatively, new interfaces could get new names, e.g. DataInput and DataOutput, current implementation adds "implements DataXXX" to its source code. Kryo sources using streams are updated almost everywhere to take DataInput and DataOutput as parameters. The advantage of this approach is that user code remains unchanged in most cases.
From nathan.s...@gmail.com on August 21, 2012 11:03:12
Interfaces are a bit cleaner, but I hesitate to introduce an abstraction layer until we are sure it is needed. Have we seen that alternate Input/Output implementations are faster? Unsafe is slightly faster so far, at the cost of larger output. Is it enough of an improvement to warrant including an implementation? I'm not sure off heap will be faster.
From romixlev on August 21, 2012 11:53:42
Interfaces are a bit cleaner, but I hesitate to introduce an abstraction layer until we are sure it is needed.
I'm not quite sure that introducing such an interface is a big deal. Cannot quite follow why you're against it.
After all, JDK library (and almost all libs) uses this approach all over the place. The only goal is to separate interface/specification from implementation. Performance overhead is almost zero, if present at all. And client code is not dependent on the implementation any more, which makes it more future proof in case Kryo implementation changes later.
Have we seen that alternate Input/Output implementations are faster?
I'm working on tests and benchmarks that should provide more detailed information.
BTW, one thing I can see is that my quickser library, which is very similar to Kryo in many ways in term of implementation, is still faster than Kryo with Unsafe-based approach. My feeling (really just a feeling, which I cannot proof) is that this is due to the fact that a call-stack for serializing an object graph is leaner in case of quickser. Probably JIT can better optimize in this case.
From romixlev on August 24, 2012 09:00:08
Finally, we are getting somewhere! I optimized writing/reading of arrays when using my UnsafeOutput and UnsafeInput. With this improvement, I see that serialize/deserialize roundtrips are 5-20 times faster (!!!) than using standard DefaultArraySerializers and Input/Output. The bigger is an array, the bigger is the win when using UnsafeOutput and UnsafeInput.
Also on non-array-heavy objects I see performance improvements when using unsafe streams. But there they are not so spectacular.
Of course, with Unsafe streams, no variable length encoding is performed for ints, longs, etc. The resulting representation is therefore (much) bigger.
May be a post-compression can be applied afterwards to make the representation smaller. I'll try to create a test for it to see how much it will affect performance.
From nathan.s...@gmail.com on August 24, 2012 13:19:57
I'm not quite sure that introducing such an interface is a big deal. Cannot quite follow why you're against it.
I'm not against it. Interfaces can be introduced everywhere, but generally shouldn't be until they are needed. My performance question wasn't about interfaced method calls, but whether alternate implementations were worthwhile. It sounds like they could be for situations where speed is preferred over size.
BTW, one thing I can see is that my quickser library, which is very similar to Kryo in many ways in term of implementation, is still faster than Kryo with Unsafe-based approach.
From a quick look, quickser seems to have fewer method calls, map lookups, etc.
Interesting to hear your early results. DefaultArraySerializers.IntArraySerializer does variable length encoding. Since Unsafe gives up size for speed, it would be better to compare Unsafe to a 4 byte int encoding.
It would be convenient to have an option to enforce byte order.
From romixlev on August 24, 2012 13:32:58
Interesting to hear your early results. DefaultArraySerializers.IntArraySerializer does variable length encoding. Since Unsafe gives up size for speed, it would be better to compare Unsafe to a 4 byte int encoding.
I actually tried it. I create classes FastInput and FastOutput, derived from Input and Output. There I overloaded write/read methods so that all native types would use their native size without any variable length encoding. Tests have shown that it makes the situation even worse, i.e. these "Fast" classes are not fast at all. They are slower than your original Input and Output classes that do variable length encoding. The reason for this is that we can only write ints/longs byte-by-byte if Unsafe (or ByteBuffer) is not used.
It would be convenient to have an option to enforce byte order. I agree. Where should we put such an option? I guess as a method in Input/Output?
Another question: I think I would be convenient to have an option to tell if variable length encoding should be enabled or disabled for a given Input/Output stream. This way one could e.g. enable variable length encoding even for Unsafe-based streams. It would result in much smaller size of produced serialized representation (probably the same size as produced now by the vanilla Kryo), but would be of course slower. But probably still faster than current Input/Output implementation. What do you think?
From romixlev on August 24, 2012 13:39:17
To make my last point about an option for indicating variable length encoding for a given stream:
From romixlev on August 24, 2012 13:50:38
BTW, one more interesting observation:
Unsafe-based FieldSerializer outperforms ASM-based one default serializer for all fields that are non-public, because ASM cannot handle them efficiently and FieldSerializer falls back to using reflection. At the same time, an Unsafe-based serialize doesn't care about such differences, as it treats all fields simply as a content of memory cells.
And as we know, a lot of classes (if not a majority) written in OOP style define their fields as private or at least protected. It means that Unsafe-based serializer wins in such situations.
From nathan.s...@gmail.com on August 24, 2012 22:22:40
Good point on Unsafe use with private fields. Possibly ReflectASM could be augmented to use Unsafe. It seems like we should automatically use Unsafe, if available, where it has no drawbacks (larger size).
I would expect Output#writeInt(int) (4 bytes) to be faster than Output#writeInt(int, boolean) (variable length) because it has no branching, even though it more often writes more bytes.
I'm ok with a stream being configured to write bytes differently than the stream methods indicate. The default streams (ie the current ones) probably don't need a setting to use variable length encodings, but the Unsafe streams might.
Some configuration can't be at the stream level. Eg, primitive arrays use serializers, so serializing arrays via Unsafe requires different serializers for those types. Do we require registering these serializers manually? If so, at least users would understand what is going on. I guess it would also be the user's problem to fallback to non-Unsafe if Unsafe is unavailable. We probably shouldn't waste much resources on fallback though, as it is relatively easy to avoid running where Unsafe is unavailable.
From romixlev on August 25, 2012 00:56:54
Good point on Unsafe use with private fields. Possibly ReflectASM could be augmented to use Unsafe. It seems like we should automatically use Unsafe, if available,
This is how I did it now. FieldSerializer checks if Unsafe is available and uses it. If not, it fallbacks to the current ASM-based approach.
where it has no drawbacks (larger size). Just to make clear. I use Unsafe for two different purposes now:
1) in FieldSerializer for reading/writing fields. In this setting it does not affect size in any way. It just speeds up access to fields of an object. Performance wins are moderate for public classes with public fields. It gets better for classes with private fields, as I explained before.
2) I use Unsafe in UnsafeInput and UnsafeOutput. It allows for writing ints/longs/shorts very effectively with one machine instruction. Plus it supports bulk reading/writing of arrays of primitive types.
Both of these features can be used independently. I.e. you can use ASM-based FieldSerializer and Unsafe streams, or Unsafe-based FieldSerializer and usual streams or use Unsafe for everything.
I would expect Output#writeInt(int) (4 bytes) to be faster than Output#writeInt(int, boolean) (variable length) because it has no branching, even though it more often writes more bytes.
Mee too. But as I explained, testing and benchmarking has shown that in fact it is slower for int/long arrays.
I'm ok with a stream being configured to write bytes differently than the stream methods indicate. The default streams (ie the current ones) probably don't need a setting to use variable length encodings, but the Unsafe streams might.
Hmm. I'd prefer to introduce it for all streams, just to be more uniform. But we can start with UnsafeInput and UnsafeOutput and then see if we also extend Input/Output to support it.
Some configuration can't be at the stream level. Eg, primitive arrays use serializers, so serializing arrays via Unsafe requires different serializers for those types. Do we require registering these serializers manually? If so, at least users would understand what is going on. I guess it would also be the user's problem to fallback to non-Unsafe if Unsafe is unavailable. We probably shouldn't waste much resources on fallback though, as it is relatively easy to avoid running where Unsafe is unavailable.
No, there is no need to change anything in the user-space for array serializers. Instead I refactored current serializers for primitive arrays. What I did is the following: I added to streams new APIs for bulk operations on arrays of all primitive types, not only for bytes as it was the case before. The loop that writes/reads elements one-by-one into a stream is moved from the corresponding write/read methos of e.g. IntArraySerializer into a stream. We have now writeInts/readInts methods there. So, IntArraySerializer just invokes it. It is similar to the writeBytes approach used by ByteArraySerializer. The usual Input/Output do not provide any efficient support for these bulk operations. They still use one-by-one element approach, i.e. work exactly as before - no change in semantics. But Unsafe streams implement these bulk operations very efficiently, becase Unsafe has very good support for it.
From nathan.s...@gmail.com on August 25, 2012 01:51:43
FieldSerializer checks if Unsafe is available and uses it. If not, it fallbacks to the current ASM-based approach.
It would be nice if this was done in ReflectASM. This way projects unrelated to Kryo that just want faster reflection can also benefit.
I added to streams new APIs for bulk operations on arrays of all primitive types, not only for bytes as it was the case before.
Ah, nice. Since the serializer is just invoking the stream method, a different serializer can be plugged in if different behavior is desired.
I'd prefer to introduce it for all streams, just to be more uniform.
I can't think of a use case for the setting with the default streams. The setting is really for Unsafe streams, as there it provides a meaningful speed/size tradeoff. Eg, users might want to use Unsafe for all ints, or maybe they want variable length ints and only writeInt(int) and writeInts(int[]) should use Unsafe.
It sounds like this will be a great addition to the library! :)
From romixlev on August 25, 2012 02:15:38
It would be nice if this was done in ReflectASM. This way projects unrelated to Kryo that just want faster reflection can also benefit.
Once I provide a patch, you can check if you want to incorporate it into ReflectASM. In fact, this Unsafe-based access to fields is a very small piece of code. It should be pretty easy to port it into any other library.
Another potential candidate to be incorporated into other libs is implementation of Unsafe-streams. I could imagine that libs like Quickser, protostuff and the like could benefit from it.
BTW, speaking of ReflectASM, my impression at some places there was that the ASM-based code-generation looked very complex, i.e. difficult to understand by a human. How have you produced it? Is it hand-written, or have you taken a class with the code you needed and then used ASM option to produce a piece of Java-code that generates the same class/bytecode? If it was done this way, couldn't it be automated somehow so that it is easy to see the actual transformation and not the tons of ASM API invocations?
From nathan.s...@gmail.com on August 25, 2012 03:04:58
ASM has a tool for outputting ASM code for a Java class. This was used at first on a hand written Java class that almost does what ReflectASM needs to do for a single method/field. The output was used as a basic skeleton and additional hand written ASM code was added. It isn't possible to automatically generate. Readability has recently improved with contributions from the user named "serverperformance".
From romixlev on August 28, 2012 09:33:45
I implemented a few more optimizations:
There is also a problem:
I was thinking about adding an option for selecting a byte order for Unsafe-streams. It would make them (a bit) slower, but could work. But there is a catch: bulk operations (see the optimization above and previous message about array processing optimization) can only be done effectively if native byte-order is used. What is the best in this situation then? Should we state that Unsafe streams always use native byte order and users should be simply aware of it. Or should we add option to set the byte-order, but warn that it will kill many of benefits delivered by Unsafe-streams?
From romixlev on August 31, 2012 02:00:28
I added a few more Input/Output streams that support serialization into/from ByteBuffers, including off-heap buffers and then Unsafe-based streams on top of it. As a result, one can now very fast serialize directly into JVM's off-heap memory. There is virtually no difference compared to serialization using Unsafe into byte[].
I think the whole thing is almost feature complete now. I'm polishing the JavaDocs and adding more unit tests. Hopefully I'll be able to provide a patch rather soon, may be next week.
From nathan.s...@gmail.com on August 31, 2012 02:20:10
- more efficient varInt implementation for Unsafe case. It tries to minimize the number of memory writes. It first tries to collect as many bytes as possible into a 4-byte integer value and then writes it into memory. This seems to cut times by 2 in many cases, compared to current streams. But it is still 2-3 times slower than unsafe-based streams, which do not use varInt encoding and write 4-bytes using Unsafe directly.
Interesting. Is collecting an 8 byte long faster?
an interesting optimization in Unsafe-based FieldSerializer for classes that contain a lot of fields of primitive types, i.e. data classes. I did the following observation: The layout that JVM internally uses for fields in the class is done in such a way, that all fields of primitive types are grouped together (plus some padding and alignment).
Also interesting. Did you implement this? We could change the sorting of fields in FieldSerializer to be primitives first, alpha second. This would break serialized data, but might be worthwhile (says the guy without terabytes of data).
I was thinking about adding an option for selecting a byte order for Unsafe- streams.
I think I read it was a 20% performance hit when byte order needed adjustment (eg Long.reverseBytes). Is Unsafe still work it in that case? If not, we don't need the option and users can avoid Unsafe when byte order is important. Otherwise we can either have an option. We could have boolean checks everywhere. We could have two sets of streams, where the necessary set is chosen based on native byte order, and we could allow the byte order to use to be specified so people could optimize for a particular native order. The byte order feature could always be added at a later time.
I think the whole thing is almost feature complete now. I'm polishing the JavaDocs and adding more unit tests. Hopefully I'll be able to provide a patch rather soon, may be next week.
Looking forward to seeing your stuff! :) But take your time, I've been super busy and haven't gotten to play with Kryo for quite a while. It isn't relenting any time soon either. :( Must actually finish my app if I want to keep eating!
From romixlev on August 31, 2012 03:48:00
- more efficient varInt implementation for Unsafe case. It tries to minimize the
number of memory writes. It first tries to collect as many bytes as possible into a 4-byte integer value and then writes it into memory. This seems to cut times by 2 in many cases, compared to current streams. But it is still 2-3 times slower than unsafe-based streams, which do not use varInt encoding and write 4-bytes using Unsafe directly.
Interesting. Is collecting an 8 byte long faster?
I'll check it.
such a way, that all fields of primitive types are grouped together (plus some padding and alignment).
Also interesting. Did you implement this?
Yes. It is implemented. See below.
We could change the sorting of fields in FieldSerializer to be primitives first, alpha second. This would break serialized > data, but might be worthwhile (says the guy without terabytes of data).
The layout of fields in memory is already "primitives first, everything else afterwards", at least in Oracle's JVM. But my implementation (which is done in the FieldSerializer) is not rely on Oracle that much. It uses Unsafe to get the offset of each field in the object's memory. Then it sorts by offsets and tries to see the longest regions with adjacent primitive fields. Such regions are then written/read using bulk operations instead of processing them field-by-field.
Regarding the byte-order, the issue is that using non-native one is slower (I haven't measured by how much, but may be 20% looks like a good guess) and most important, it does not effectively support bulk operations using Unsafe. I'll probably try to add byte-order options to ByteBuffer-based streams. But all Unsafe-based streams would support only native byte-order. And after all, it is not the most exciting feature, as you say. If there will be any interest, it could be adjusted later.
Looking forward to seeing your stuff! :) But take your time, I've been super busy and haven't gotten to play with Kryo for quite a while. It isn't relenting any time soon either. :( Must actually finish my app if I want to keep eating!
I see your point ;-) I'm also not in a hurry :-)
From nathan.s...@gmail.com on August 31, 2012 11:22:43
The layout of fields in memory is already "primitives first, everything else afterwards", at least in Oracle's JVM. But my implementation (which is done in the FieldSerializer) is not rely on Oracle that much. It uses Unsafe to get the offset of each field in the object's memory. Then it sorts by offsets and tries to see the longest regions with adjacent primitive fields. Such regions are then written/read using bulk operations instead of processing them field-by-field.
This sounds ok as long as the order of the fields is identical when deserialized. I wonder how consistent this is across even Oracle JVMs.
I'll probably try to add byte-order options to ByteBuffer-based streams.
I'd like to make sure ByteBuffer streams are faster than buffering with a byte[] as we do now.
From romixlev on August 31, 2012 11:35:28
This sounds ok as long as the order of the fields is identical when deserialized. I wonder how consistent this is across even Oracle JVMs.
I think it is consistent among Oracle JVMs. This ordering has to do with alignment issues. I'm not sure though if this is required by standard.
From nathan.s...@gmail.com on August 31, 2012 11:57:42
I doubt it is in the spec. The ordering of fields on classes definitely isn't.
From valtih1...@gmail.com on September 05, 2012 02:38:14
Does it help to solve the ***ing "no-args" constructor issue?
From romixlev on September 05, 2012 04:07:00
Hmm. I haven't looked into it. But in principle Unsafe-based approach could help, as java.misc.Unsafe has a method to create instances without invoking any constructors. It is called allocateInstance.
From nathan.s...@gmail.com on September 05, 2012 16:02:26
Objenesis already supplies a way to construct using Unsafe. Objenesis doesn't really do very much though, might be nice to fork it and clean up that part of the API a bit.
From romixlev on September 12, 2012 11:37:24
Hi Nate,
I finally found time to clean-up a bit and create a patch against trunk. Please find the patch attached to this message. It includes besides changes also a few additional unit tests for new stream classes. I can build the trunk with these changes and all tests are still green.
Please let me know what you think about it and what needs to be changed/improved before you would consider including it into trunk.
What is included into this patch:
I hope I managed to cover all major aspects of the patch in this description. Let me know if something is missing and needs to be explained.
Cheers, Leo (which is a pseudonym, as you can realize by looking at the @author annotation of some source files. I often use it for mailing lists and open-source projects :-)
Attachment: Support-for-Unsafe-based-serialization.patch
From nathan.s...@gmail.com on September 12, 2012 15:12:30
Great! It will be some time before I can look at it. Long story short, my dog was killed by the negligence of the people watching her while I was gone, I packed all my shit and drove away. So I'm on an impromptu road trip without much of a goal.
Status: Accepted
From romixlev on September 13, 2012 00:49:35
Long story short, my dog was killed by the negligence of the people watching her while I was gone, I packed all my shit and drove away. So I'm on an impromptu road trip without much of a goal.
Oh. My condolences. I'm very sorry for your dog! It is really very difficult to get over it and it will certainly take some time for you to recover after this.
It will be some time before I can look at it.
No hurry. Take your time.
From romixlev on September 20, 2012 10:32:19
Hi Nate,
This is a small ping. I see that you are starting to answer mails on the mailing list and wanted to check the status of my patch. Have you had any time already to review it or to get an impression of it?
From romixlev on December 11, 2012 04:08:30
Hi,
BTW, there is a blog from GridGain where they claim to have a much better serialization:
http://gridgain.blogspot.de/2012/12/java-serialization-good-fast-and-faster.html I think that their presented tests are flawed, because they (re)create too many streams inside loops, etc. So, I tried to play with it to check if Kryo can perform better on those tests. First I tried to move as much as possible outside of loops. I also pre-registered all required classes. All this resulted in a small performance improvement, but far away from their results yet. Then I tried to use my Unsafe-based classes instead of standard Input/Output from Kryo. As a result, I got times below 1 ms per iteration! In particular, my optimized array handling really pays off here. @Nate: This is a usual reminder about reviewing this patch. Any chances that you'll have a time for it soon?
From romixlev on March 05, 2013 09:47:38
In fact, it most likely also fixes issues 83 and 92, but I'd like someone to confirm it.
Also, some feedback on the new Unsafe-based features would be interesting. E.g. whether it really improves performance, breaks any existing code, etc.
From romixlev on June 28, 2012 08:37:32
It could be interesting to provide sun.misc.Unsafe based implementation of FieldSerializer. It would be able to read object's memory, write into memory buffers directly using readXYZ/writeXYZ methods of the Unsafe class.
One potential advantage could be speed.
More over, and probably more important, it could be seen as an alternative to using ASM, because some people may want to reduce external dependencies to a minimum.
The implementation should pretty easy and straight forward. Many frameworks use a similar trick already (protostuff, etc).
Original issue: http://code.google.com/p/kryo/issues/detail?id=75