adrn / schwimmbad

A common interface to processing pools.
MIT License
115 stars 18 forks source link

Option to reuse a worker instance for different tasks #40

Closed mabruzzo closed 3 years ago

mabruzzo commented 3 years ago

I wanted to gauge interest for introducing the option to reuse a worker object for multiple tasks in the MPIPool (instead of unpacking a separate instance of the worker before executing every task). Doing this would allow a worker object to cache some intermediate results (that may be non-trivial to pickle) and reuse it for subsequent tasks.

In one of my applications, I'm using a workaround in which I store the cached data in a global variable. However, this feature would facilitate the garbage collection of the cached result after the thread pool is closed.

I'm more than happy to submit a PR for this myself, but I wanted to make sure that there isn't strong opposition to this feature to before I start (especially since this is a bit of a departure from interface of multiprocessing.Pool).

adrn commented 3 years ago

Cool, yea, this sounds useful! When I need to do this for my own applications, I use the (poorly documented) schwimmbad.utils.batch_tasks() function to create task batches, and then within a worker I loop over a list of tasks instead of a single item. I suppose it could make sense to have a .batch_map() method on MPIPool() to handle this all in one go. Or were you imagining something different?

mabruzzo commented 3 years ago

Your solution is not really what I was imagining, but it definitely seems like the best approach for solving this type of problem in a general way.

If I'm not mistaken, I think it's already implemented, since it is already defined for MPIPool's parent class (and it just calls the map method).