Open fzhinkin opened 5 months ago
Combination of String::toByteArray
and UnsafeBufferOperations::moveToTail
show better performance when it comes to strings whose chars could be encoded using same-length byte sequences. However, the current implementation significantly outperforms String::toByteArray
-based approach on strings where characters require byte sequences of variadic lengths.
And, of course, String::toByteArray
result in higher allocation rate.
In serialization, we leverage intrinsified String::getChars
(pros: vectorized, much faster compact strings unpacking, no rangechecks) and also rely on the fact that our CharArray
s are pooled, leading to no allocations.
For kotlinx-io, it seems like such an approach does not provide any significant performance improvements on average: https://github.com/Kotlin/kotlinx-io/blob/435acfb038ba6803692783b28e86b4148e0d5019/core/jvm/src/SinksJvm.kt#L147 https://jmh.morethan.io/?source=https://gist.githubusercontent.com/fzhinkin/a11a2ce595cadb8fba700cdbe18a6f4f/raw/fbb87909636731439aac80948fa023bcc10d4269/toCharArray-based-writeString.json
In some scenarios, performance is better, in others it's worse.
On JVM, instead of reading each character separately and then encoding it to UTF-8 and writing to a buffer, it might be faster to:
For other libraries, namely
kotlinx.serialization
, some of these approaches performed better. While quick ad-hoc experiments didn't show any pros forkotlinx-io
, it does make sense to investigate it thoroughly.