Closed Yomguithereal closed 9 months ago
rayon
is generally more popular and powerful and support and does support for unsorted parallel streaming: https://docs.rs/rayon/latest/rayon/iter/trait.ParallelBridge.html#tymethod.par_bridge
The only reason pariter
was created was that rayon's model is clunky when you want the preserve the order. Given the extra complexity and a new axis of combinatory explosion, I don't think I'd want to add it.
Thank you for the answer @dpc. I hadn’t realized that rayon’s parallel bridge was able to work on lazy iterators without requiring to spend too much memory.
Do I understand correctly in your documentation that « handling backpressure » means that for ordered output some tasks may slow everything down (which is alright in my case) rather than leaking memory (which is not alright in my case)? Something rayon is incapable of, or at least clunkily, as you said?
In any case I understand your point and will close this issue. Thank you very much for your help.
Do I understand correctly in your documentation that « handling backpressure » means that for ordered output some tasks may slow everything down (which is alright in my case) rather than leaking memory (which is not alright in my case)? Something rayon is incapable of, or at least clunkily, as you said?
Yes,
https://github.com/dpc/pariter/blob/ea97cc651fd600fa5df6fc02319ad66fbd55ef0a/src/parallel_map.rs#L79
And item that takes outstandingly long to process comparing to other items will stall the processing to limit the memory use to (roughly) buffer_size
pending items. This allows trading memory use for processing time deviation amortization, like in a proper queue system.
pump_tx
is used to send the incoming items to the thredpool, only as existing elements were already returned.
Thank you for the answer @dpc :)
Hi @dpc, thank you very much for your library. I use it to perform parallel work over rows of very large CSV files without needing to buffer the whole file into memory or chunk it.
I understand that emitting the results in the same order as the consumed iterator is a sensible default, but it also requires a hashmap and some computation time. Also, sometimes, some result might lag behind (e.g. when performing network requests), and this can either, depending on the implementation, cost a lot of memory or slow everything down.
Would it be of any value to you if I contributed a PR adding a way to allow results to be emitted in arbitrary order, as soon as they are received? I guess this could be an option of the custom builders, rather than specific methods suffixed
_unordered
since there are already many suffixes already and this would increase the matrix of available choices (normal, custom, scoped etc.).I wish you a good day,