Open chaokunyang opened 6 months ago
Hi @chaokunyang , Have you started implementing this feature? If it hasn't been implemented yet, I can take over and implement this.
@LiangliangSui I haven't, feel free to take over it
Okay, I will do this.
@chaokunyang We currently use UTF8 for cross-language serialization, and only Java(not cross-language) uses Latin/UTF16.
public void writeString(MemoryBuffer buffer, String value) {
if (isJava) {
writeJavaString(buffer, value);
} else {
writeUTF8String(buffer, value);
}
}
Will we use UTF16 as the default cross-language String encoding in the future?
I see that the cross-language currently designed in fury_xlang_serialization_spec still uses UTF8 as the default.
Depends on the language and the string. For golang, since the string is utf-8 encoded already. Fury go will encode data as utf8 string by a copy. But java/javascript/python may encode string as latin1 or utf16 and send to furygo. So we need to support utf16 too. And if the peer language, we may configure furygo use latin1/utf16 by default too.
But java/javascript/python may encode string as latin1 or utf16 and send to furygo.
Latin1/UTF16 is only used in Language.JAVA
and will not be sent to furygo.
Okay, I got it.
In the future, java/javascript/python may all encode string as latin1/utf16
and send to furygo.
Is your feature request related to a problem? Please describe.
Currently Fury xlang serialization use utf8 for string encoding, which is not performance efficient in many languages.
We introduced utf16 in https://fury.apache.org/docs/specification/fury_xlang_serialization_spec#string . But golang doesn't support utf16, we should support to transcode utf16 encoded string to utf8 string in fury go deserialization.
Describe the solution you'd like
Implement utf16 to utf8 transcoding in fury go. The implementation should use SIMD to provide faster speed.
Additional context
1413