Closed matthewghgriffiths closed 2 months ago
Hi @matthewghgriffiths! Thank you and great monkey patching job, happy to see that 😄!
At some point the .map
operation had some concurrency_mode
parameter, either THREAD or PROCESS. I removed it to simplify testing (as what works with threads can lead to serialization errors with processes), but I am definitely with you in wanting to add it back!
Questions:
ordered
param during your patching or is it because you worked on the last version from PyPI which does not include it yet?Also related, it would be interesting to check how the lib behaves under the free-threaded mode coming with python 3.13, right?
I'm calling thread unsafe libraries (e.g. pyensembl and sqlite) so have to resort to process based parallelism.
I'm using the pypi version so no ordered keyword in that.
Ok makes perfect sense. Issue flagged as TODO. I will ping you in the future draft PR for review/co-authoring 🙏🏻
Heya @matthewghgriffiths I fixed a few serialization issues and added an unit-tested exprimental hook (commit):
from streamable.iters import OSConcurrentMappingIterable
from concurrent.futures import ProcessPoolExecutor
OSConcurrentMappingIterable.EXECUTOR_CLASS = ProcessPoolExecutor
It applies to .map
and .foreach
with concurrency > 1
for both ordered=True
(first in first out) and ordered=False
(first done first out).
If it makes sense for you I will add it to the next release so that you can try it out for us 🙏🏻
(We can then think about adding it properly to the Stream
's interface)
fyi @matthewghgriffiths you can try it out after pip install streamable==1.1.0-rc.1
fyi #25 adds the within_processes: bool
param do .map
/.foreach
since 1.2.0: use via_processes
btw @matthewghgriffiths if the implementation looks good to you, I would like to add you as a co-author on this!
Looks good to me, though I only made a small request!
You brought the idea and motivation, explored the code to see how feasible it is, reviewed -> that's definitely co-authoring imo 🤝 If you agree I just need an email associated with your github
This is a cool library, the way you've set it up has made it fairly straightforward to add Process based parallelism via monkey patching (see code snippet below) - but are there likely to be any issues if a
pmap
method was added tostreamable.Stream
?