cgarciae / pypeln

Concurrent data pipelines in Python >>>
https://cgarciae.github.io/pypeln
MIT License
1.55k stars 98 forks source link

allow multiprocess dep instead of multiprocessing #94

Open lalo opened 2 years ago

lalo commented 2 years ago

multiprocess external lib has other benefits like using dill instead of pickle, allowing us more leeway on certain edge cases that are not compatible with native multiprocessing.

https://github.com/uqfoundation/multiprocess

from their readme:

multiprocess enables:

objects to be transferred between processes using pipes or multi-producer/multi-consumer queues
objects to be shared between processes using a server process or (for simple data) shared memory

multiprocess provides:

equivalents of all the synchronization primitives in threading
a Pool class to facilitate submitting tasks to worker processes
enhanced serialization, using dill

Let me know your thoughts on this type of change. Happy to iterate on it.

Thanks

Related: https://github.com/cgarciae/pypeln/issues/53

cgarciae commented 1 year ago

I wonder why CI didn't trigger.

cgarciae commented 1 year ago

Hey @lalo, sorry for being out so long. I like the change but I'd prefer if this was under a configuration flag, it could also be generalized to other implementations e.g:

from pypeln import config
import multiprocess

config.set_multiprocessing_impl(multiprocess)
lalo commented 1 year ago

I'll take a look