Consider alternative Sink.writeString implementations on JVM

Kotlin / kotlinx-io

Kotlin multiplatform I/O library

Apache License 2.0

1.27k stars 56 forks source link

Consider alternative Sink.writeString implementations on JVM #316

Open fzhinkin opened 5 months ago

fzhinkin commented 5 months ago

On JVM, instead of reading each character separately and then encoding it to UTF-8 and writing to a buffer, it might be faster to:

extract chars to a CharArray and then iterate over it;
simply use toByteArray.

For other libraries, namely kotlinx.serialization, some of these approaches performed better. While quick ad-hoc experiments didn't show any pros for kotlinx-io, it does make sense to investigate it thoroughly.

fzhinkin commented 2 months ago

Combination of String::toByteArray and UnsafeBufferOperations::moveToTail show better performance when it comes to strings whose chars could be encoded using same-length byte sequences. However, the current implementation significantly outperforms String::toByteArray-based approach on strings where characters require byte sequences of variadic lengths. And, of course, String::toByteArray result in higher allocation rate.

qwwdfsad commented 2 months ago

In serialization, we leverage intrinsified String::getChars (pros: vectorized, much faster compact strings unpacking, no rangechecks) and also rely on the fact that our CharArrays are pooled, leading to no allocations.

fzhinkin commented 2 months ago

For kotlinx-io, it seems like such an approach does not provide any significant performance improvements on average: https://github.com/Kotlin/kotlinx-io/blob/435acfb038ba6803692783b28e86b4148e0d5019/core/jvm/src/SinksJvm.kt#L147 https://jmh.morethan.io/?source=https://gist.githubusercontent.com/fzhinkin/a11a2ce595cadb8fba700cdbe18a6f4f/raw/fbb87909636731439aac80948fa023bcc10d4269/toCharArray-based-writeString.json

In some scenarios, performance is better, in others it's worse.