Closed vladislav-sidorovich closed 2 years ago
This will be fixed in the next client release.
The code sample creates a malformed string. I think the best solution is to detect and throw an exception in estimateSizeUtf8()
when the string is malformed.
From my point of view, an exception will be better than a corrupted document.
At the same time, java.lang.String
doesn't throw an exception. Also, I can send/receive such a string via REST (http).
So, I can send/receive such strings, and I can process it in the code in my service but I can't store it in long-term storage (Aerospike), it is a bit confusing, is it?
If aerospike-client will be able to process such strings it will be the best option for me.
Java's getBytes(StandardCharsets.UTF_8)
modifies malformed strings to include a "?" in place of the invalid surrogate pair when converting to UTF8. When the UTF8 bytes are converted back into a string, there is a mismatch between the original string and the converted string. This will cause problems for applications that test these strings for equality. In the interest of safety, the client will throw an exception when malformed strings are encountered in estimateSizeUtf8()
.
Java client 6.1.3 is released: https://download.aerospike.com/download/client/java/notes.html
The root cause of the issue: https://github.com/aerospike/aerospike-client-java/blob/6fcfb23f7946b078427a197d5b3f828d0ee7fe53/client/src/com/aerospike/client/command/Buffer.java#L163
The effect is here: https://github.com/aerospike/aerospike-client-java/blob/8251d673a6ec573e662541cd6f045241db164467/client/src/com/aerospike/client/util/Packer.java#L403,L410
Reference implementation: https://github.com/openjdk/jdk11u-dev/blob/c1411113b396f468963a1deacc3b57ed366e735a/src/java.base/share/classes/java/lang/StringCoding.java#L924-L950 or java.lang.String#encodeUTF8_UTF16 Amazon Correto 18
Notes: What are surrogates? https://unicode.org/faq/utf_bom.html#utf16-2