Open dev-foa opened 1 year ago
Thanks for raising this @dev-foa. I wonder if this is similar to what we are seeing in https://github.com/dask/dask/issues/10291
@dev-foa can you please provide a runnable example showing this problem? What you are seeing may very well depend on the details of what map_function
is doing, what data you are generating etc.
Also, the code you are providing seems to not be runnable. There appear to be a couple of syntax errors, e.g.
df['a'] + ['b']
(note how the second dataframe is missing)agg({'a':'sum'},{'b':'mean'},{'c':'mean'})
(The agg input should be one dictionary, not multiple dictionaries.I suggest to try to reproduce this issue with a LocalCluster
instead of a FargateCluster as well.
See also https://matthewrocklin.com/minimal-bug-reports for some suggestions of how to produce such a minimal example.
Describe the issue: Upon running a computation on a distributed cluster, dask doesn't seem to be using all the cores of a multi core machine. This became clear upon running it with different configurations as described in the table below.
The computation involves creating a dataframe from a map function and then running some aggregations on the resultant dataframe
Minimal Complete Verifiable Example:
Anything else we need to know?:
The
map_function
takes around 5-6 seconds for each itemEnvironment: