Open MartinDevi opened 5 years ago
The plan is to eventually add a ready-to-use primitive for that purpose and that is tracked under issue #172
I think the problem is that you don't want the progress indicator running in the same thread pool with the decryption workers, otherwise they will get all the resources.
As example if you replace your progressActor with an AtomicInteger
suspend fun <T, R> Iterable
val counter = AtomicInteger(0)
val deferredList = map {
async {
val result = transform(it)
progressChannel.send(counter.incrementAndGet())
result
}
}
val result = deferredList.awaitAll()
result
}
You then can change the test extracting the progressActor there to the main context and reducing the ThreadPool to 3. This works very well for me.
val progressActor = actor<Int> {
for (progress in this) {
println("Progress $progress")
}
}
withContext(newFixedThreadPoolContext(3, "transformer")) {
(1..100).mapAsync(progressActor) {
Thread.sleep(200)
println("Processed item $it")
}
}
progressActor.close()
example of output:
Processed item 1 Processed item 2 Progress 1 Progress 2 Processed item 3 Progress 3 Processed item 4 Processed item 5 Progress 4 Progress 5 Processed item 6 Progress 6 Processed item 8 Processed item 7 Progress 7 Progress 8 Processed item 9 Progress 9 Processed item 11 Processed item 10 Progress 10 Progress 11 ...
I hope this could help you.
On Fri, 1 Mar 2019 at 16:01, Martin Devillers notifications@github.com wrote:
I think the coroutine library is missing a simple solution to use parallel decomposition on the contents of a large list without saturating the dispatcher queue, so that other coroutine work won't be suspended until the entire list has been processed.
I present the issue here and some solutions that I considered. If you know better solutions, please point them out.
It seems to me like a common enough scenario and a serious enough issue to be considered in the standard library. Issue
I have a large list of items, and I need to perform CPU-intensive work on each of them (in case you're curious, deciphering) to obtain an output. I'd like to allow this work to be done in parallel on each item. My first instinct was to simply use async to start all the processes, then await that they all finish.
suspend fun <T, R> Iterable
.mapAsync( transform: suspend (T) -> R ): List = coroutineScope { map { async { transform(it) } }.awaitAll() } This works fine in a very simple scenario. However, I then wanted to display the progress, with the number of items that have been successfully processed. I introduced an actor to receive a signal whenever an item has been processed and count the progress, then forward progress updates to another channel.
suspend fun <T, R> Iterable
.mapAsync( progressChannel: SendChannel , transform: suspend (T) -> R ): List = coroutineScope { val progressActor = actor { var index = 0 for (unit in channel) { progressChannel.send(index++) } } val deferredList = map { async { val result = transform(it) progressActor.send(Unit) result } } val result = deferredList.awaitAll() progressActor.close() result } However, this implementation doesn't work. The progress updates are only received after all the items have been processed. To reproduce, I set up the following unit test.
@Test fun testMapAsync() = runBlocking
{ withContext(newFixedThreadPoolContext(4, "")) { val progressActor = actor { for (progress in this) { println("Progress $progress") } } (1..100).mapAsync(progressActor) { Thread.sleep(200) println("Processed item $it") } progressActor.close() }
}
Rather than printing the progress progressively, shortly after each item is processed, this test prints all the statements indicating that items are processed, then prints all the statements that progress updates have been received.
From what I understand, this is due to the fact that the dispatcher is "fair". Since all the worker coroutines are enqueued immediately, they will be executed before any other coroutines, including the one which updates the progress. Existing Possible Solutions and Drawbacks Dedicated Dispatcher
This method doesn't require sample code, it simply relies on having a dedicated dispatcher for the worker coroutines. The drawbacks are obvious. It's also possible to invert the solution, by having a dedicated dispatcher for the progress actor, but this doesn't fully solve the issue either. Fan-Out
It's possible to use the "Fan-out" mechanism described in the coroutine guide, to manually parallelize the processing by pulling from a channel.
const val PARALLELISM = 4
suspend fun <T, R> Iterable
.mapAsync( progressChannel: SendChannel , transform: suspend (T) -> R ): List = coroutineScope { val result = mutableListOf () val progressActor = actor { var index = 0 for (item in channel) { result += item progressChannel.send(index++) } } val source = produce { forEach { send(it) } close() } List(PARALLELISM) { launch { for (item in source) { progressActor.send(transform(item)) } } }.joinAll() progressActor.close() result } When doing this, the initial order of the elements isn't maintained, and the parallelism has to be explicitly provided rather than relying on the default dispatcher to optimize for the number of cores (which could also be done manually, but seems like repetition). Producer Channel
Rather than using map on a list to call async, use produce to create a channel. This artificially introduces suspension points so that the worker coroutines aren't immediately added to the executor queue, but instead are added regularly when the queue finishes some work. In other words, since adding workers is itself done in a coroutine, it will only be done at the rate at which the coroutine dispatcher becomes available.
suspend fun <T, R> Iterable
.mapAsync( progressChannel: SendChannel , transform: suspend (T) -> R ): List = coroutineScope { val progressActor = actor { var index = 0 for (unit in this) { progressChannel.send(index++) } } val deferredChannel = produce<Deferred > { forEach { send(async { val result = transform(it) progressActor.send(Unit) result }) } } val result = deferredChannel.toList().awaitAll() progressActor.close() result } This seems like a hack, and I'm unsure of how it behaves when the worker coroutine has multiple suspension points, causing the dispatcher queue to "roll" more frequently.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Kotlin/kotlinx.coroutines/issues/1022, or mute the thread https://github.com/notifications/unsubscribe-auth/AAdYDP-RuKCdmDydHCJLN99gRzH7TtPdks5vSU7VgaJpZM4bZUCY .
In my opinion the issue in not really addressed and new operators, as proposed, do not solve it.
@MartinDevi solutions works well for this particular use case, but launch multiple mapAsync
concurrently can raise the same issue, probably similar for later proposals.
The coroutine scheduler is not fair (a multi thread executor cannot be truly fair), in my opinion the point is that the scheduler are not preemptive, so some large batches can occupy it for an undefined time.
Using only coroutines I consider a solution for this issue a dedicated dispatcher task, bounded like the Default
dispatcher but using different set of resource. This should fix this and similar issues.
Probably this idea can be enhanced, a similar solution is to use for an 8 CPU computer, 7 + 1
threads for Default
dispatcher and the same 7 + another 1
threads for the background dispatcher.
Maybe I really missed something, but I am really curios to know what, thank you.
Thanks for all your input.
IMO there's two main points to take from this issue:
map { async { /* ... * } }.awaitAll()
pattern, because implementing this pattern correctly is non-trivial.With that regard, while I agree that issue #172 would help to provide a solution, I think we need a dedicated function for this use-case. Ideally, I think it shouldn't require explicitly specifying the concurrency. I think standard solutions which involve adding dedicated dispatchers for each type of task would ultimately lead to more confusion, and like @fvasco mentioned they won't scale well.
Regarding the suggested point 2
, this mechanism should be global, not local.
In your first example, on 4 CPU machine setting PARALLELISM=4
works well to decipher one file and update UI.
In the future you can define a method to decipher all files in a folder using mapAsync
, later you define another method to decipher all folders in a disk.
This last method launches 4*4*4
parallel tasks, so the supposed parallelism is not honored.
Actually I think there is no way to have a nicely refreshing update if it is not running in its own thread. Sharing the thread pool with greedy coroutines would always result is erratic updates. Happy to be proven wrong but using coroutines for cpu heavy tasks suggest me this.
Uberto
On Mon, 4 Mar 2019, 08:07 Francesco Vasco, notifications@github.com wrote:
Regarding the suggested point 2, this mechanism should be global, not local.
In your first example, on 4 CPU machine setting PARALLELISM=4 works well to decipher one file and update UI. In the future you can define a method to decipher all files in a folder using mapAsync, later you define another method to decipher all folders in a disk. This last method launches 444 parallel tasks, so the supposed parallelism is not honored.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Kotlin/kotlinx.coroutines/issues/1022#issuecomment-469155872, or mute the thread https://github.com/notifications/unsubscribe-auth/AAdYDN-3SeQqFsrXnBsKgi__kVhG8Lcoks5vTNQugaJpZM4bZUCY .
I think the coroutine library is missing a simple solution to use parallel decomposition on the contents of a large list without saturating the dispatcher queue, so that other coroutine work won't be suspended until the entire list has been processed.
I present the issue here and some solutions that I considered. If you know better solutions, please point them out.
It seems to me like a common enough scenario and a serious enough issue to be considered in the standard library.
Issue
I have a large list of items, and I need to perform CPU-intensive work on each of them (in case you're curious, deciphering) to obtain an output. I'd like to allow this work to be done in parallel on each item. My first instinct was to simply use
async
to start all the processes, then await that they all finish.This works fine in a very simple scenario. However, I then wanted to display the progress, with the number of items that have been successfully processed. I introduced an actor to receive a signal whenever an item has been processed and count the progress, then forward progress updates to another channel.
However, this implementation doesn't work. The progress updates are only received after all the items have been processed. To reproduce, I set up the following unit test.
Rather than printing the progress progressively, shortly after each item is processed, this test prints all the statements indicating that items are processed, then prints all the statements that progress updates have been received.
From what I understand, this is due to the fact that the dispatcher is "fair". Since all the worker coroutines are enqueued immediately, they will be executed before any other coroutines, including the one which updates the progress.
Existing Possible Solutions and Drawbacks
Dedicated Dispatcher
This method doesn't require sample code, it simply relies on having a dedicated dispatcher for the worker coroutines. The drawbacks are obvious. It's also possible to invert the solution, by having a dedicated dispatcher for the progress actor, but this doesn't fully solve the issue either.
Fan-Out
It's possible to use the "Fan-out" mechanism described in the coroutine guide, to manually parallelize the processing by pulling from a channel.
When doing this, the initial order of the elements isn't maintained, and the parallelism has to be explicitly provided rather than relying on the default dispatcher to optimize for the number of cores (which could also be done manually, but seems like repetition).
Producer Channel
Rather than using
map
on a list to callasync
, useproduce
to create a channel. This artificially introduces suspension points so that the worker coroutines aren't immediately added to the executor queue, but instead are added regularly when the queue finishes some work. In other words, since adding workers is itself done in a coroutine, it will only be done at the rate at which the coroutine dispatcher becomes available.This seems like a hack, and I'm unsure of how it behaves when the worker coroutine has multiple suspension points, causing the dispatcher queue to "roll" more frequently.