Dask now uses n_jobs instead of all possible cores. In dask.compute the parameters is now set as follows: num_workers=self.n_jobs
Also removed the line self.dask_graphs_ = tmp_result_scores since this variable isn't actually used anywhere.
Where should the reviewer start?
How should this PR be tested?
est = tpot.TPOTClassifier(use_dask=True, n_jobs=1)
est = tpot.TPOTClassifier(use_dask=True, n_jobs=32)
and observe the CPU usage and time to completion.
Any background context you want to provide?
Note that within a dask client context manager, the client parameters for n_workers is used rather than the n_jobs passed into tpot.
What are the relevant issues?
1223
Screenshots (if appropriate)
Questions:
Do the docs need to be updated?
Might be helpful to demonstrate an example using localcluster, and dask joblib. When within a client context manager, n_jobs in dask.compute is actually ignored in favor of the LocalCluster parameters. Since this behavior is not obvious, I think it would be helpful to include in the docs.
For example:
with LocalCluster(threads_per_worker=32, n_workers=1, processes=False) as cluster:
with Client(cluster) as client:
with dask.distributed.performance_report():
start = time.time()
est = tpot.TPOTClassifier(use_dask=True)
est.fit(X,y)
t1 = time.time() - start
print(t1)
What does this PR do?
Dask now uses n_jobs instead of all possible cores. In dask.compute the parameters is now set as follows: num_workers=self.n_jobs
Also removed the line
self.dask_graphs_ = tmp_result_scores
since this variable isn't actually used anywhere.Where should the reviewer start?
How should this PR be tested?
est = tpot.TPOTClassifier(use_dask=True, n_jobs=1) est = tpot.TPOTClassifier(use_dask=True, n_jobs=32)
and observe the CPU usage and time to completion.
Any background context you want to provide?
Note that within a dask client context manager, the client parameters for n_workers is used rather than the n_jobs passed into tpot.
What are the relevant issues?
1223
Screenshots (if appropriate)
Questions:
Might be helpful to demonstrate an example using localcluster, and dask joblib. When within a client context manager, n_jobs in dask.compute is actually ignored in favor of the LocalCluster parameters. Since this behavior is not obvious, I think it would be helpful to include in the docs.
For example: