dask / dask-tutorial

Dask tutorial
https://tutorial.dask.org
BSD 3-Clause "New" or "Revised" License
1.83k stars 702 forks source link

Compute times in Ch.3 -- Arrays, Weather means #134

Closed berkgercek closed 2 years ago

berkgercek commented 4 years ago

In chapter 3 of the tutorials, in the section in which the user is asked to compute the mean of the 3D array of temperature values for the earth, running the following code:

%time meantemp = x.mean(axis=0).compute()

fig = plt.figure(figsize=(16, 8))

plt.imshow(meantemp, cmap='RdBu_r')

Returns a compute time of

CPU times: user 2min, sys: 14min 16s, total: 16min 17s

Wall time: 42.7 s

Is this an expected amount of time for the computation, given the large sys CPU time of 14 minutes? For reference x is a dask array of shape 31 x 5760 x 11520 with a chunk size of (500, 500), and the computation is being run on an AMD processor with 12 cores.

berkgercek commented 4 years ago

I am aware that the .compute() method for x is not necessary, but the result does not change when it is omitted.

TomAugspurger commented 4 years ago

I think that looks correct.

JimCircadian commented 4 years ago

Hello. I think this might be related to this PR I've raised. I found that not using the multithreaded implementation massively improved processing and caused no errors. (You can also select --small for the datasets when running prep.py)

https://github.com/dask/dask-tutorial/pull/187

jsignell commented 2 years ago

I'm going to close this issue since it seems like it is resolved.