eishay / jvm-serializers

Benchmark comparing serialization libraries on the JVM
http://groups.google.com/group/java-serialization-benchmarking
3.29k stars 561 forks source link

Add fury serializer benchmark #89

Closed chaokunyang closed 12 months ago

chaokunyang commented 12 months ago

Hi, I'm the author of Fury serialization framework, it's a fast multi-language serialization framework powered by jit and zero-copy. The complete introduction are published as a blog in medium.

I just integrated fury serializers with jvm-serializers inspired by serializers.kryo.Kryo. 5 mode of fury serialization config are added: name class registration references support int/long/string compression
fury ✅ yes no ✅ yes
fury-auto-flat no no ✅ yes
fury-auto no ✅ yes ✅ yes
fury-registered ✅ yes ✅ yes ✅ yes
fury-fastest ✅ yes ✅ yes no

No manual optimization serializer are added since fury jit-generated code are highly-optimized and more performant.

Benchmark results will be uploaded later

pascaldekloe commented 12 months ago

Well done! 👍

pascaldekloe commented 12 months ago

Perhaps do https://github.com/alecthomas/go_serialization_benchmarks as well @chaokunyang ?

chaokunyang commented 12 months ago

Perhaps do https://github.com/alecthomas/go_serialization_benchmarks as well @chaokunyang ?

Good suggestions, thanks. I'll add fury go later when it's production ready.

chaokunyang commented 12 months ago

@pascaldekloe Will the wiki be updated later? I tested some serializers earlier this day, seems fury are fastest serialization framework in the list:

lib                                create     ser   deser   total   size  +dfl
fury                                   59     313     296     609    257   163
colfer                                 47     427     757    1184    241   152
protostuff-graph                       83     861     767    1628    242   153
kryo-registered-flat                  115     879     816    1695    218   140
protostuff-graph-runtime               59    1045     765    1809    244   154
protobuf                              353    1562     638    2199    242   152
protostuff                            102     837    1378    2214    242   153
flatbuffers                            64    2027    2021    4048    424   234
hessian                                67    4415    6385   10799    504   319

I just benchmark the top fast serializers, complete running took too long to finish.

pascaldekloe commented 12 months ago

It's a wiki so feel free to update as you please. I suspect something's off with those numbers though. Are you caching in some form @chaokunyang? 😉

chaokunyang commented 12 months ago

No cache, I just did many complicated optimization using jit and sun.misc.Unsafe, so the performance are great. FYI, I post the generated code into gists: Media, MediaContent, Image, which may shows why it's so fast.

chaokunyang commented 12 months ago

Will the benchmark results will be more convincing if it's updated by others except me? I'm not sure which will be better. I‘d like to upload the benchmark results if it's ok.

pascaldekloe commented 12 months ago

Interesting to see what Java did in the recent years. From what I can tell, the fury parse is more of a memory mapping (like FlatBuffers intended) rather than an actual read of the data. Buffer changes will modify content, including the Strings right? 😬 Do you do the same in Go with unsafe.String.

I don't see anything wrong with you publishing the new scores. I do think you should either explicitly document the difference in your approach v.s. all others, or allocate a new buffer for each Object in the benchmark, as the buffer data now belongs to/with the Object.

chaokunyang commented 12 months ago

Thanks for your reply. It's not a memory mapping, the integration in this pr use fury stream format, it need to read the binary in serial order and copy the data if necessary. For example, when deserializing string/array/etc., it will copy from buffer to create a string/array/etc. object.

Fury does have a memory mapping format, which I called row format: https://github.com/alipay/fury/blob/main/docs/guide/row_format_guide.md, https://github.com/alipay/fury/tree/main/java/fury-format. it's inspired by arrow/spark/flatbuffer, but it's not compatible with jdk serialization and thus not integrated into this pr.

chaokunyang commented 12 months ago

String are copied in fury go too to avoid the memory gc issue.

pascaldekloe commented 12 months ago

Zero-copy code does reflect the input buffer as a rule, correct? Otherwise it is not zero copy, but a new copy (malloc).

https://github.com/alipay/fury/blob/f802e5d034fe92553fd37cbf465980d3ea636316/java/fury-core/src/main/java/io/fury/serializer/StringSerializer.java#L525

That is, in a zero-copy scenario you read objects with their dedicated buffer and then pass the entire pack on to processing. Otherwise, you use a singe read buffer, and pass the read copies on to processing. Zero-copy will be faster as it needs only one buffer/malloc. Non-zero-copy is simpler and more predictable.

chaokunyang commented 12 months ago

The passed char[] are already copied from serialization data buffer. We just avoid the memory copy in string constructor. We can't avoid copy since the buffer is byte array but string constructor need a char array and it doesn't support to pass offset

pascaldekloe commented 12 months ago

Well yeah, that is what zero-copy means: use the bytes "from serialization data buffer".

So you are saying you make one big copy/malloc, and then instantiate all data from that copy? That'd be single-copy which is still good, obviously. 😁

I'll read in to it this weekend and come back to you. Great to learn this way. Thanks. 👍

chaokunyang commented 12 months ago

That's a good idea, but it still introduces gc overhead. If only small part of data buffer are hold in memory, it will prevent the big buffer from gc. Some golang libraries does support that.

But it's not what fury do. Fury did every copies needed. We do have zero-copy support which can avoid copy data when deserializing ByteBuffer/Arrow Buffers, but it's not enabled in this benchmark.

An example will be more clear, considering following example:

Integer integer= 1;
String str = "str";
byte[] bytearray = new byte[3];
Object[] arr = new Object[] {integer, str, bytearray};
byte[] bytes = fury.serializeJavaObject(arr);
Object[] newObj = fury.deserializeJavaObject(bytes, Object[].class);

When deserializing, fury will:

pascaldekloe commented 12 months ago

The io.furry.Furry#deserializeJavaObject calls io.furry.Furry#deserializeFromStream which has a special optimization for java.io.ByteArrayInputStream. The underlying byte array of the stream is accessed directly with reflection by io.fury.memory.MemoryUtils#wrap.

Generated data.media.MediaFuryCodec#read uses this io.fury.memory.MemoryBuffer to map Strings with io.fury.serializer.StringSerializer#readJava8CompressedString. There input is copied into a new char array and installed into a String directly.

I love it. 👏👏👏😀 You managed to eliminate a copy in String construction with Java with reflection. It has been 27 years since Java 1, so little hope for a descent UTF-8 API in a future release.

The bounds-check elimination with sun.misc.Unsafe is nice too. No checks at all seems dangerous. 😬 Why not check for the worst case first? You could only do unsafe varints when there's more than 5 or 9 bytes remaining. Once check on a common case, especially with the String payloads at the end.

If you find a way to slice a char array (for multiple Strings) then you've got full speed in Java.

The ByteArrayInputStream optimization is a bit cheeky as such scenary will mostly bennefit benchmarks and not real world I/O. Other than that the results are solid. Please publish them on the Wiki and show the world. 🙏

pascaldekloe commented 12 months ago

Our data formats are quite similar. With the "latin switch" applied in the reserved flag one could argue nearly identical. I will borrow some of your methods in the Java part with full credits in the README @chaokunyang.

chaokunyang commented 12 months ago

Our data formats are quite similar. With the "latin switch" applied in the reserved flag one could argue nearly identical. I will borrow some of your methods in the Java part with full credits in the README @chaokunyang.

Glad to hear that. It's a pleasure that fury can help improve the performance of colfer.

pascaldekloe commented 12 months ago

The impact is is a bit more than just an improvement. Even with UTF-8 to char [UTF-16] conversion the speedup is over 20 % in total.

chaokunyang commented 12 months ago

The io.furry.Furry#deserializeJavaObject calls io.furry.Furry#deserializeFromStream which has a special optimization for java.io.ByteArrayInputStream. The underlying byte array of the stream is accessed directly with reflection by io.fury.memory.MemoryUtils#wrap.

Generated data.media.MediaFuryCodec#read uses this io.fury.memory.MemoryBuffer to map Strings with io.fury.serializer.StringSerializer#readJava8CompressedString. There input is copied into a new char array and installed into a String directly.

I love it. 👏👏👏😀 You managed to eliminate a copy in String construction with Java with reflection. It has been 27 years since Java 1, so little hope for a descent UTF-8 API in a future release.

The bounds-check elimination with sun.misc.Unsafe is nice too. No checks at all seems dangerous. 😬 Why not check for the worst case first? You could only do unsafe varints when there's more than 5 or 9 bytes remaining. Once check on a common case, especially with the String payloads at the end.

If you find a way to slice a char array (for multiple Strings) then you've got full speed in Java.

The ByteArrayInputStream optimization is a bit cheeky as such scenary will mostly bennefit benchmarks and not real world I/O. Other than that the results are solid. Please publish them on the Wiki and show the world. 🙏

For No checks at all seems dangerous: Fury does have checks, it check it earlier in the caller for cases we can collappse multiple checks into one check.

For ByteArrayInputStream optimization: I disabled the copy optimization in #90, the benchmark results still persists.

chaokunyang commented 12 months ago

Updated macos benchmarks for home/Wiki to jdk11 image image

and update linux benchmarks for Newer-Results-on-Different-Hardware) for jdk8

image image

pascaldekloe commented 11 months ago

“The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string.” — JEP 254

The LATIN1 coder is for the full 8-bit range—not just (io.fury.serializer.StringSerializer.)isAscii. You have to encode range U+0080–U+00FF as LATIN1 too @chaokunyang. Otherwise, users are subject to serious bugs such as strings which won't compare equal anymore. 😱

pascaldekloe commented 11 months ago

The same issue ☝️ applies to the COMPACT_STRINGS switch. Users can enable or disable this feature on JVM setting. As such, we can not hardcode the compressed-string option.

    /** ISO-8859-1/Latin-1 payloads in String flag. */
    private static final boolean _latin1Option = _resolveLatin1Option();
    /** Configuration is hidden in package protected field. */
    private static boolean _resolveLatin1Option() {
        Unsafe unsafe = _getUnsafe();
        if (unsafe != null) try {
            Field f = String.class.getDeclaredField("COMPACT_STRINGS");
            return unsafe.getBoolean("", unsafe.staticFieldOffset(f));
        } catch (Exception ignored) {
            ignored.printStackTrace();
        }           
        return false;
    }
pascaldekloe commented 11 months ago

Last one, I swear. 🤞😁

The field name name assertions plus coder value 0 and 1 are ceiled with serialVersionUID. I think we must include the check to be future proof.

    /** (Unsafe) field position in String or -1 when not found. */
    private static final long _stringCoderOffset = _resolveStringCoderOffset();
    /** (Unsafe) field position in String or -1 when not found. */
    private static final long _stringValueOffset = _resolveStringValueOffset();
    /** Unsafe lookup is on best-effort basis. */ 
    private static long _resolveStringCoderOffset() {
        // field structure is ceiled with Serializable version
        long stringLayout = java.io.ObjectStreamClass.lookup(String.class).getSerialVersionUID();
        if (stringLayout != -6849794470754667710L) return -1;

        Unsafe unsafe = _getUnsafe();
        if (unsafe == null) return -1;
        return unsafe.objectFieldOffset(String.class.getDeclaredField("coder"));
    }
chaokunyang commented 11 months ago

“The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string.” — JEP 254

The LATIN1 coder is for the full 8-bit range—not just (io.fury.serializer.StringSerializer.)isAscii. You have to encode range U+0080–U+00FF as LATIN1 too @chaokunyang. Otherwise, users are subject to serious bugs such as strings which won't compare equal anymore. 😱

This is a naming issue, what fury do in StringSerializer is latin check, but we named it to ASCII check. We've updated it to latin check furyjs, but forgot to update it in java. I planed to update it when fury add utf8 encoding or latin-utf8 hybrid encoding. Thanks for head up this, the naming should be updated, It's misleading

chaokunyang commented 11 months ago
_resolveLatin1Option

Not exactly, if users disable compact string for jdk9+, which barely happen, the coder we get will be UTF16 for latin string , this will bloat the data but the correctness will still persists

chaokunyang commented 11 months ago

Last one, I swear. 🤞😁

The field name name assertions plus coder value 0 and 1 are ceiled with serialVersionUID. I think we must include the check to be future proof.

    /** (Unsafe) field position in String or -1 when not found. */
    private static final long _stringCoderOffset = _resolveStringCoderOffset();
    /** (Unsafe) field position in String or -1 when not found. */
    private static final long _stringValueOffset = _resolveStringValueOffset();
    /** Unsafe lookup is on best-effort basis. */ 
    private static long _resolveStringCoderOffset() {
        // field structure is ceiled with Serializable version
        long stringLayout = java.io.ObjectStreamClass.lookup(String.class).getSerialVersionUID();
        if (stringLayout != -6849794470754667710L) return -1;

        Unsafe unsafe = _getUnsafe();
        if (unsafe == null) return -1;
        return unsafe.objectFieldOffset(String.class.getDeclaredField("coder"));
    }

Good suggestion, currently we assume String internal structure for jdk9+. We checked byte[] field: image

But didn't check coder field, we should check it too. thanks for this catch.

pascaldekloe commented 11 months ago

The Serialization UID seals field names, types and the codec value.

Not exactly, if users disable compact string for jdk9+, which barely happen, the coder we get will be UTF16 for latin string , this will bloat the data but the correctness will still persists

The generated code calls readJava8CompressedString which does readAsciiChars without knowledge of the Java setting for as far as I can tell? You shouldn't install a (UTF-16) char when a Latin-1 byte would fit. The Strinvalue may look fine but such Strings don't compare equal to their literals. 👻 Please do tell if I missed something there.

chaokunyang commented 11 months ago

The Serialization UID seals field names, types and the codec value.

Not exactly, if users disable compact string for jdk9+, which barely happen, the coder we get will be UTF16 for latin string , this will bloat the data but the correctness will still persists

The generated code calls readJava8CompressedString which does readAsciiChars without knowledge of the Java setting for as far as I can tell? You shouldn't install a (UTF-16) char when a Latin-1 byte would fit. The Strinvalue may look fine but such Strings don't compare equal to their literals. 👻 Please do tell if I missed something there.

readJava8CompressedString will read coder first, then read latin string if the coder si latin. If the coder is utf16, but serialized data is utf16 encodered latin data, the deserialization will just read the utf16 encodered latin data into a char array and create string object from it. Actually, JDK8 only support char[] as internal storage of string object. You allways need to create utf-16 char from Latin-1 byte

pascaldekloe commented 11 months ago

Ahhh... I see now. So JDK8 means Oracle Java 1.8, and that one's differend from the OpenJDK line. The generated code can target only one of the two: either char[] or byte[]. I'll try to figure out if we can get this information at runtime in a static constant.

chaokunyang commented 11 months ago

Fury write encoding to data, this is different from other libs. Most of framework support utf8 only, whose encoding is inefficient for many languages.

pascaldekloe commented 11 months ago

It's not the UTF-8 to UTF-16 conversion that makes Java slow. I just tried a direct conversion with Unsafe and the performance is fine. The issue lies in temporary array allocation and the additional copy cycle. You were right to bypass such bad design with direct insertion.

However, the Oracle v.s. OpenJDK issue got me in doubt now. Can't have anything nice with different fields, different array types, runtime switches and native endianness all together. That is just too many loose ends with no guarantee for the future. 🤔

chaokunyang commented 11 months ago

The java.lang.String#coder is package-level API in java.lang.String, which is stable somehow. But you are right that a different JDK vendor can change it's data structure, though it doesn't happen for now. It didn't change across JDK9~21 for all common JDKs. Maybe it's worth such compilcation considering String is used so common. And if some jdk does changed this internal structure, we can just add patch for such jdks without breaking backward compatibility.