Open mrocklin opened 4 years ago
cc @quasiben @madsbk @pentschev
Whilst looking through old issues, this one seems to have gone without activity for some time. Is it still of interest given the changes in high-level-graphs since then? If "yes" this sounds like a cool self-contained project that we should specifically task someone with.
We currently collect a broad class of operations into high level graphs.
This includes elementwise operations (as above) and also transpose, the first bits of reductions, tensordot, and so on. The object
layer.dsk
is a computation to run in a single task. Normally it gets evaluated bydask.get
(I think) (a simple single threaded scheduler) by calling these functions in a sensible order on the input chunks.However we could also choose to be more intelligent here, and modify this sequence of functions. This is a good time to perform intelligent optimizations because we know that this one sub-graph is likely both small, and likely to be run many times across all of our chunks. Two optimizations have come up in the past:
a = a + 1
toa += 1
when safe to do sofastmath=True
This came up briefly in https://github.com/dask/dask/issues/1964 . @jcrist may also have an experiment lying around
GPUs
Avoiding memory copies becomes more important when the underlying chunks are cupy rather than numpy arrays. An optimization of this sort may also be interested in knowing what the metadata of the input chunks are.