Closed mabruzzo closed 3 years ago
Cool, yea, this sounds useful! When I need to do this for my own applications, I use the (poorly documented) schwimmbad.utils.batch_tasks()
function to create task batches, and then within a worker I loop over a list of tasks instead of a single item. I suppose it could make sense to have a .batch_map()
method on MPIPool()
to handle this all in one go. Or were you imagining something different?
Your solution is not really what I was imagining, but it definitely seems like the best approach for solving this type of problem in a general way.
If I'm not mistaken, I think it's already implemented, since it is already defined for MPIPool
's parent class (and it just calls the map
method).
I wanted to gauge interest for introducing the option to reuse a worker object for multiple tasks in the
MPIPool
(instead of unpacking a separate instance of the worker before executing every task). Doing this would allow a worker object to cache some intermediate results (that may be non-trivial to pickle) and reuse it for subsequent tasks.In one of my applications, I'm using a workaround in which I store the cached data in a global variable. However, this feature would facilitate the garbage collection of the cached result after the thread pool is closed.
I'm more than happy to submit a PR for this myself, but I wanted to make sure that there isn't strong opposition to this feature to before I start (especially since this is a bit of a departure from interface of
multiprocessing.Pool
).