Open fvasco opened 7 months ago
Hi @qwwdfsad I sent you JFR recording privately via email for more details:
$ sha256sum recording.zip
c2224744906842a45f71dc588481100f74b1bf014fad02bd93d6af2b288a3c69 recording.zip
Thanks!
I'm on vacation right now, so expect a bit of radio silence from me; I've got what you sent and will return to it later
Thank you @qwwdfsad I found some other details in another JFR, I sent you for further reference:
$ sha256sum recording-0e09ce990f4cb86c2-regular.zip
e13570bcc543eb8803279089b330120c8b607211dea2dc6c164aeb3f84e9f338 recording-0e09ce990f4cb86c2-regular.zip
This time, the profiler points to the updateCollectorIndexLocked
method.
Happy holiday
I tried to create a reproducer.
The code is:
fun main() {
runBlocking {
val consumerCount = 1_000
val messageCount = 1000
repeat(50) {
val mutableStateFlow = MutableSharedFlow<Int>()
val sharedFlow = mutableStateFlow.asSharedFlow()
val nanos: Long
coroutineScope {
repeat(consumerCount) {
launch(start = CoroutineStart.UNDISPATCHED) {
val channel = sharedFlow.produceIn(this)
repeat(messageCount) { channel.receive() }
cancel()
}
}
delay(1.seconds)
nanos = System.nanoTime()
launch(Dispatchers.Default) {
repeat(messageCount) { mutableStateFlow.emit(it) }
}
}
val delta = System.nanoTime() - nanos
println(NumberFormat.getIntegerInstance().format(delta / consumerCount / messageCount))
}
}
}
I did some benchmark with different counts
val consumerCount = 1_000
val messageCount = 100
824
794
819
823
val consumerCount = 10_000
val messageCount = 100
11.109
11.145
11.019
val consumerCount = 1_000
val messageCount = 1_000
773
784
758
val consumerCount = 10_000
val messageCount = 1_000
6.010
10.349
10.427
10.556
val consumerCount = 10_000
val messageCount = 10_000
10.623
I attach the JFR reproducer.zip
Maybe a large number of subscribers can cause this behavior, so these measures can be normal. At same time, performance changes greatly depending on subscribers.
I confirm that our issue was caused by a code similar to my reproducer, we updated our code to reduce subscribers count.
If the above benchmark are OK for you, feel free to close this issue.
Thanks for the self-contained reproducer and all the profiles, it made my investigation so much easier 🙇
You hit the weakest spot of the SharedFlow
collector algorithm -- unfortunately, a single collect
scales linearly with the number of collectors existing, which makes it quadratic for any reasonable use-case (each collector scales linearly -> the total CPU burnt is quadratic).
I have a draft idea of how to fix it -- for each unique update (value/index/version) we can fallback to concurrent helping for the linear part (which still might be quadratic if you are unlucky enough and all collectors get OS-scheduled at the same time), but should be much better and eliminate the issue for a single-threaded usages. Yet it requires a proper investigation and thoughtful testing. I'll keep the issue open, as it's clearly a performance bottleneck
We detected high CPU usage on
kotlinx.coroutines.flow.SharedFlowImpl
using Java Flight Recorder on a 2 CPU machine. The consumed CPU was two order of magnitude than others, neither other code looks causing this CPU usage.JFR's thread dump:
a bit later:
Unfornutately I am not able to provide a reproducer, we don't have idea how can cause this issue in our code. Secondary, this version of our server is running from Jan 8, 2024 without issue.
Our code use lesser 1% of CPU (
SharedFlowImpl.emit
), should we check some API usage that can cause this issue?