Closed chaokunyang closed 12 months ago
Well done! 👍
Perhaps do https://github.com/alecthomas/go_serialization_benchmarks as well @chaokunyang ?
Perhaps do https://github.com/alecthomas/go_serialization_benchmarks as well @chaokunyang ?
Good suggestions, thanks. I'll add fury go later when it's production ready.
@pascaldekloe Will the wiki be updated later? I tested some serializers earlier this day, seems fury are fastest serialization framework in the list:
lib create ser deser total size +dfl
fury 59 313 296 609 257 163
colfer 47 427 757 1184 241 152
protostuff-graph 83 861 767 1628 242 153
kryo-registered-flat 115 879 816 1695 218 140
protostuff-graph-runtime 59 1045 765 1809 244 154
protobuf 353 1562 638 2199 242 152
protostuff 102 837 1378 2214 242 153
flatbuffers 64 2027 2021 4048 424 234
hessian 67 4415 6385 10799 504 319
I just benchmark the top fast serializers, complete running took too long to finish.
It's a wiki so feel free to update as you please. I suspect something's off with those numbers though. Are you caching in some form @chaokunyang? 😉
No cache, I just did many complicated optimization using jit and sun.misc.Unsafe
, so the performance are great. FYI, I post the generated code into gists: Media, MediaContent, Image, which may shows why it's so fast.
Will the benchmark results will be more convincing if it's updated by others except me? I'm not sure which will be better. I‘d like to upload the benchmark results if it's ok.
Interesting to see what Java did in the recent years. From what I can tell, the fury parse is more of a memory mapping (like FlatBuffers intended) rather than an actual read of the data. Buffer changes will modify content, including the String
s right? 😬 Do you do the same in Go with unsafe.String
.
I don't see anything wrong with you publishing the new scores. I do think you should either explicitly document the difference in your approach v.s. all others, or allocate a new buffer for each Object in the benchmark, as the buffer data now belongs to/with the Object.
Thanks for your reply. It's not a memory mapping, the integration in this pr use fury stream format, it need to read the binary in serial order and copy the data if necessary. For example, when deserializing string/array/etc., it will copy from buffer to create a string/array/etc. object.
Fury does have a memory mapping format, which I called row format: https://github.com/alipay/fury/blob/main/docs/guide/row_format_guide.md, https://github.com/alipay/fury/tree/main/java/fury-format. it's inspired by arrow/spark/flatbuffer, but it's not compatible with jdk serialization and thus not integrated into this pr.
String are copied in fury go too to avoid the memory gc issue.
Zero-copy code does reflect the input buffer as a rule, correct? Otherwise it is not zero copy, but a new copy (malloc).
That is, in a zero-copy scenario you read objects with their dedicated buffer and then pass the entire pack on to processing. Otherwise, you use a singe read buffer, and pass the read copies on to processing. Zero-copy will be faster as it needs only one buffer/malloc. Non-zero-copy is simpler and more predictable.
The passed char[]
are already copied from serialization data buffer. We just avoid the memory copy in string constructor. We can't avoid copy since the buffer is byte array but string constructor need a char array and it doesn't support to pass offset
Well yeah, that is what zero-copy means: use the bytes "from serialization data buffer".
So you are saying you make one big copy/malloc, and then instantiate all data from that copy? That'd be single-copy which is still good, obviously. 😁
I'll read in to it this weekend and come back to you. Great to learn this way. Thanks. 👍
That's a good idea, but it still introduces gc overhead. If only small part of data buffer are hold in memory, it will prevent the big buffer from gc. Some golang libraries does support that.
But it's not what fury do. Fury did every copies needed. We do have zero-copy support which can avoid copy data when deserializing ByteBuffer
/Arrow Buffers
, but it's not enabled in this benchmark.
An example will be more clear, considering following example:
Integer integer= 1;
String str = "str";
byte[] bytearray = new byte[3];
Object[] arr = new Object[] {integer, str, bytearray};
byte[] bytes = fury.serializeJavaObject(arr);
Object[] newObj = fury.deserializeJavaObject(bytes, Object[].class);
When deserializing, fury will:
arr
first, which will be not_null
;Object[]
, then create Object[]
: newObj
integer
Integer.class
integer
, add it to newObj
str
String.class
str
3
bytes
into a char[]
(as you can see here, string deserialization will have a copy)char[]
, which is the method you referecned before: https://github.com/alipay/fury/blob/f802e5d034fe92553fd37cbf465980d3ea636316/java/fury-core/src/main/java/io/fury/serializer/StringSerializer.java#L525newObj
bytearray
byte[]
byte[]
bytes
. a copy will happen herenewObj
newObj
, returns newObj
The io.furry.Furry#deserializeJavaObject calls io.furry.Furry#deserializeFromStream which has a special optimization for java.io.ByteArrayInputStream. The underlying byte array of the stream is accessed directly with reflection by io.fury.memory.MemoryUtils#wrap.
Generated data.media.MediaFuryCodec#read uses this io.fury.memory.MemoryBuffer to map Strings with io.fury.serializer.StringSerializer#readJava8CompressedString. There input is copied into a new char array and installed into a String directly.
I love it. 👏👏👏😀 You managed to eliminate a copy in String construction with Java with reflection. It has been 27 years since Java 1, so little hope for a descent UTF-8 API in a future release.
The bounds-check elimination with sun.misc.Unsafe
is nice too. No checks at all seems dangerous. 😬 Why not check for the worst case first? You could only do unsafe varints when there's more than 5 or 9 bytes remaining. Once check on a common case, especially with the String payloads at the end.
If you find a way to slice a char
array (for multiple Strings) then you've got full speed in Java.
The ByteArrayInputStream optimization is a bit cheeky as such scenary will mostly bennefit benchmarks and not real world I/O. Other than that the results are solid. Please publish them on the Wiki and show the world. 🙏
Our data formats are quite similar. With the "latin switch" applied in the reserved flag one could argue nearly identical. I will borrow some of your methods in the Java part with full credits in the README @chaokunyang.
Our data formats are quite similar. With the "latin switch" applied in the reserved flag one could argue nearly identical. I will borrow some of your methods in the Java part with full credits in the README @chaokunyang.
Glad to hear that. It's a pleasure that fury can help improve the performance of colfer.
The impact is is a bit more than just an improvement. Even with UTF-8 to char [UTF-16] conversion the speedup is over 20 % in total.
The io.furry.Furry#deserializeJavaObject calls io.furry.Furry#deserializeFromStream which has a special optimization for java.io.ByteArrayInputStream. The underlying byte array of the stream is accessed directly with reflection by io.fury.memory.MemoryUtils#wrap.
Generated data.media.MediaFuryCodec#read uses this io.fury.memory.MemoryBuffer to map Strings with io.fury.serializer.StringSerializer#readJava8CompressedString. There input is copied into a new char array and installed into a String directly.
I love it. 👏👏👏😀 You managed to eliminate a copy in String construction with Java with reflection. It has been 27 years since Java 1, so little hope for a descent UTF-8 API in a future release.
The bounds-check elimination with
sun.misc.Unsafe
is nice too. No checks at all seems dangerous. 😬 Why not check for the worst case first? You could only do unsafe varints when there's more than 5 or 9 bytes remaining. Once check on a common case, especially with the String payloads at the end.If you find a way to slice a
char
array (for multiple Strings) then you've got full speed in Java.The ByteArrayInputStream optimization is a bit cheeky as such scenary will mostly bennefit benchmarks and not real world I/O. Other than that the results are solid. Please publish them on the Wiki and show the world. 🙏
For No checks at all seems dangerous
: Fury does have checks, it check it earlier in the caller for cases we can collappse multiple checks into one check.
For ByteArrayInputStream optimization
: I disabled the copy optimization in #90, the benchmark results still persists.
Updated macos benchmarks for home/Wiki to jdk11
and update linux benchmarks for Newer-Results-on-Different-Hardware) for jdk8
“The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string.” — JEP 254
The LATIN1
coder
is for the full 8-bit range—not just (io.fury.serializer.StringSerializer.
)isAscii
. You have to encode range U+0080–U+00FF as LATIN1 too @chaokunyang. Otherwise, users are subject to serious bugs such as strings which won't compare equal anymore. 😱
The same issue ☝️ applies to the COMPACT_STRINGS switch. Users can enable or disable this feature on JVM setting. As such, we can not hardcode the compressed-string option.
/** ISO-8859-1/Latin-1 payloads in String flag. */
private static final boolean _latin1Option = _resolveLatin1Option();
/** Configuration is hidden in package protected field. */
private static boolean _resolveLatin1Option() {
Unsafe unsafe = _getUnsafe();
if (unsafe != null) try {
Field f = String.class.getDeclaredField("COMPACT_STRINGS");
return unsafe.getBoolean("", unsafe.staticFieldOffset(f));
} catch (Exception ignored) {
ignored.printStackTrace();
}
return false;
}
Last one, I swear. 🤞😁
The field name name assertions plus coder
value 0 and 1 are ceiled with serialVersionUID
. I think we must include the check to be future proof.
/** (Unsafe) field position in String or -1 when not found. */
private static final long _stringCoderOffset = _resolveStringCoderOffset();
/** (Unsafe) field position in String or -1 when not found. */
private static final long _stringValueOffset = _resolveStringValueOffset();
/** Unsafe lookup is on best-effort basis. */
private static long _resolveStringCoderOffset() {
// field structure is ceiled with Serializable version
long stringLayout = java.io.ObjectStreamClass.lookup(String.class).getSerialVersionUID();
if (stringLayout != -6849794470754667710L) return -1;
Unsafe unsafe = _getUnsafe();
if (unsafe == null) return -1;
return unsafe.objectFieldOffset(String.class.getDeclaredField("coder"));
}
“The new String class will store characters encoded either as ISO-8859-1/Latin-1 (one byte per character), or as UTF-16 (two bytes per character), based upon the contents of the string.” — JEP 254
The
LATIN1
coder
is for the full 8-bit range—not just (io.fury.serializer.StringSerializer.
)isAscii
. You have to encode range U+0080–U+00FF as LATIN1 too @chaokunyang. Otherwise, users are subject to serious bugs such as strings which won't compare equal anymore. 😱
This is a naming issue, what fury do in StringSerializer
is latin check, but we named it to ASCII
check. We've updated it to latin check furyjs, but forgot to update it in java. I planed to update it when fury add utf8 encoding or latin-utf8 hybrid encoding. Thanks for head up this, the naming should be updated, It's misleading
_resolveLatin1Option
Not exactly, if users disable compact string for jdk9+, which barely happen, the coder
we get will be UTF16
for latin string , this will bloat the data but the correctness will still persists
Last one, I swear. 🤞😁
The field name name assertions plus
coder
value 0 and 1 are ceiled withserialVersionUID
. I think we must include the check to be future proof./** (Unsafe) field position in String or -1 when not found. */ private static final long _stringCoderOffset = _resolveStringCoderOffset(); /** (Unsafe) field position in String or -1 when not found. */ private static final long _stringValueOffset = _resolveStringValueOffset();
/** Unsafe lookup is on best-effort basis. */ private static long _resolveStringCoderOffset() { // field structure is ceiled with Serializable version long stringLayout = java.io.ObjectStreamClass.lookup(String.class).getSerialVersionUID(); if (stringLayout != -6849794470754667710L) return -1; Unsafe unsafe = _getUnsafe(); if (unsafe == null) return -1; return unsafe.objectFieldOffset(String.class.getDeclaredField("coder")); }
Good suggestion, currently we assume String internal structure for jdk9+. We checked byte[]
field:
But didn't check coder
field, we should check it too. thanks for this catch.
The Serialization UID seals field names, types and the codec value.
Not exactly, if users disable compact string for jdk9+, which barely happen, the coder we get will be UTF16 for latin string , this will bloat the data but the correctness will still persists
The generated code calls readJava8CompressedString
which does readAsciiChars
without knowledge of the Java setting for as far as I can tell? You shouldn't install a (UTF-16) char when a Latin-1 byte would fit. The Strinvalue may look fine but such Strings don't compare equal to their literals. 👻 Please do tell if I missed something there.
The Serialization UID seals field names, types and the codec value.
Not exactly, if users disable compact string for jdk9+, which barely happen, the coder we get will be UTF16 for latin string , this will bloat the data but the correctness will still persists
The generated code calls
readJava8CompressedString
which doesreadAsciiChars
without knowledge of the Java setting for as far as I can tell? You shouldn't install a (UTF-16) char when a Latin-1 byte would fit. The Strinvalue may look fine but such Strings don't compare equal to their literals. 👻 Please do tell if I missed something there.
readJava8CompressedString will read coder first, then read latin string if the coder si latin. If the coder is utf16, but serialized data is utf16 encodered latin data, the deserialization will just read the utf16 encodered latin data into a char array and create string object from it. Actually, JDK8 only support char[]
as internal storage of string object. You allways need to create utf-16 char from Latin-1 byte
Ahhh... I see now. So JDK8 means Oracle Java 1.8, and that one's differend from the OpenJDK line. The generated code can target only one of the two: either char[]
or byte[]
. I'll try to figure out if we can get this information at runtime in a static constant.
Fury write encoding to data, this is different from other libs. Most of framework support utf8 only, whose encoding is inefficient for many languages.
It's not the UTF-8 to UTF-16 conversion that makes Java slow. I just tried a direct conversion with Unsafe and the performance is fine. The issue lies in temporary array allocation and the additional copy cycle. You were right to bypass such bad design with direct insertion.
However, the Oracle v.s. OpenJDK issue got me in doubt now. Can't have anything nice with different fields, different array types, runtime switches and native endianness all together. That is just too many loose ends with no guarantee for the future. 🤔
The java.lang.String#coder
is package-level API in java.lang.String
, which is stable somehow. But you are right that a different JDK vendor can change it's data structure, though it doesn't happen for now. It didn't change across JDK9~21 for all common JDKs. Maybe it's worth such compilcation considering String is used so common. And if some jdk does changed this internal structure, we can just add patch for such jdks without breaking backward compatibility.
Hi, I'm the author of Fury serialization framework, it's a fast multi-language serialization framework powered by jit and zero-copy. The complete introduction are published as a blog in medium.
jvm-serializers
inspired byserializers.kryo.Kryo
.5
mode of fury serialization config are added:No manual optimization serializer are added since fury jit-generated code are highly-optimized and more performant.
Benchmark results will be uploaded later