Open agoose77 opened 7 months ago
You mean specifically the class itertools.partial
? Which function classes are you thinking about in particular, like FromParquetFn ?
@martindurant yes, much of dask-awkward's curried calls to ak.XXX
operations are class-based, e.g. the structure module.
Microbenchmark
from functools import partial
kwargs = {"arg": 1}
class Callme:
def __init__(self, fn, kwargs):
self.fn = fn
self.kwargs = kwargs
def __call__(self, *args, **kwargs):
return self.fn(*args, **self.kwargs, **kwargs)
one = Callme(sum, kwargs)
two = partial(sum, **kwargs)
In [15]: %timeit dask.base.tokenize(callme.one)
1.66 µs ± 3.49 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
In [16]: %timeit dask.base.tokenize(callme.two)
1.71 µs ± 2.78 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
If I make kwargs bigger, I find that the class tokenises faster than partial
That's fascinating! I'm guessing you didn't run the code as above, because callme.two
isn't defined?
The first block was a module called "callme"
Looking at the internals of dask, it seems that features such as
tokenize
have fast-path logic for partial. If instead dask encounters class instances, it has to invoke a serialiser. We should probably just move topartial
(which will also reduce the sloc).