Open amitmurthy opened 7 years ago
@tanmaykm Was there an example of the asyncmap leading to a performance degradation? If we have a concrete example, that would be good to drive this.
I do like the idea of having a batch execution function inside of pmap, which could use threads.
Another way to do the async case may be to allow an @async pmap(...)
syntax for the asynchronous case.
+1 for having the batch_function
keyword, with maybe asyncmap
as the default.
If there is performance degradation for compute bound jobs, I would rather have map
be the default and @async pmap(...)
the way to do the async case. It may also be the case that pmap
has too much functionality loaded into it.
@async pmap(...)
will not work as you intend it to - It will just execute the pmap
in a new task.
I am for a new keyword arg, batch_function=map
, i.e., with map
as the default.
With a correct batchsize
I don't think a local asyncmap
will have any adverse performance impact.
I was imagining @async
special casing on the pmap
argument - but that is perhaps too ugly. batch_function
is ok.
That was my original thinking. But now I realize it is difficult to predict how it will be used. for example folks are trying with batchsize of 1000.
pmap
in batch mode uses a local asyncmap to process a batch - https://github.com/JuliaLang/julia/blob/9e3318c9840e7a9e387582ba861408cefe5a4f75/base/distributed/pmap.jl#L198Considering that each computation in
pmap
is fairly large, and batch sizes small, an asyncmap would not have a major overhead and if the computation involves IO, quite beneficial.For example, if the input is a list of file names to be processed, it is efficient to interleave I/O and computation and hence a local
asyncmap
is a better fit.This issue is to take a decision whether to 1) Keep it as it is, i.e., no change - the batch is processed using
asyncmap
2) Change it to a local map. If the computation involves I/O the caller would have to explicitly partition the input and the mapping function in turn would need to perform an
asyncmap
and a flatten on the final output.3) Add another keyword arg to pmap,
batch_function=map
. To useasyncmap
, the caller would need to explicitly specifybatch_function=asyncmap
. A user defined function (for example one that uses@threads
) can also be specified.