Closed martindurant closed 8 months ago
Yeah this is more or less what I did for the histograms in the end, so that makes sense. I guess I just don't see how to pull it to the end in the case of getitems.
don't see how to pull it to the end
What do you mean?
I am thinking, that the with_field case is essntially identical, and instead of queueing a specific set of things (items to get) like this, we can have a small structure of stuff to do, where there can be a couple of specific sorts, and for each a single method says how to execute. Execution happens as soon as we see an operation that doesn't map into the queue (or ._dask,._meta get accessed).
Oh - as in - starting from that entry point I don't see how to get it to a functioning implementation because my brain is occupied with other tasks. :-)
I'm sure I could see the whole way through in a more quiet moment. The initial direction makes a lot of sense though.
A couple of failures here to wrap my head around, perhaps because of mutation somewhere; but here are the timings
Post
In [1]: import dask_awkward as dak
In [2]: arr = dak.from_lists([{"a": {"b": [1, 2, 3]}}]*5)
In [3]: arr2 = arr.a.b
In [4]: %timeit arr2 = arr.a.b
85.9 µs ± 280 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Pre
In [1]: import dask_awkward as dak
In [2]: arr = dak.from_lists([{"a": {"b": [1, 2, 3]}}]*5)
In [3]: arr2 = arr.a.b
In [4]: %timeit arr2 = arr.a.b
215 µs ± 3.12 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Notes:
Please ping me when this one is in a state to test out!
You can test it right now; I haven't had a chance to trace the failures, but probably it's subtle edge cases, being so few
Yeah I went ahead and tried it - definitely a noticeable improvement!
It leaves fancy indexing as the last thing that's taking significant time.
However, this PR has some annoying rebasing issues with #477 so I can't compose it all together pleasantly. Can't quite yet see the full picture of what is left.
@lgray , merge done, back to the same three failures as before
let's go ahead and wrap this one up so far as tests are concerned. It has a rather beneficial impact.
OK, so the problem is, that the cache also contains the output divisions, which depend on the input divisions at the time of first call. If those divisions become known, the result would be different. Interestingly, one of the couple of failing tests has this at the start:
def test_single_int(daa: dak.Array, caa: ak.Array) -> None:
daa = dak.copy(daa)
daa.eager_compute_divisions()
because it wants known divisions, but doesn't want to mutate the object held by the fixture.
Shouldn't be too hard to keep track of the state of divisions somehow as well?
I should have translated: I know what's wrong, I can fix it.
Ah, notes rather than discussion, gotcha. No problem, and cool!
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 93.16%. Comparing base (
8cb8994
) to head (9e39611
). Report is 29 commits behind head on main.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
OK, that fixes it. @lgray , maybe a speed test would be nice - I got around the former problem by simply not caching the divisions, as I don't think this part was contributing a lot compared to making meta and layers. I could instead make the cache more complex, if necessary.
I perhaps ought to write a test looking at the contents of the cache? I'm not sure we need that if all passes and times are clearly faster.
Yes with the awkward _new
patch + multifill + this we're getting all the speedup we've seen thus far.
one nitpick here:
tokenize(fn, *args, meta is not None and meta.typestr, **kwargs)
appears to be noticeable only due to the meta.typestr
call (ok it's half a second but that's not small when we are down to ~4 seconds). Particularly str(self.type)
over in awkward that this calls is costly when spammed.
Would be good savings if we can get around that, it would bring our test case below 4 seconds eval time.
The remaining place that may give us some time back after all these improvements appears to be:
I'm really not sure why __getitem__
should take so much time for a typetracer.
@agoose77 @jpivarski FYI - more things to think on.
I thought meta.typestr was faster than str(meta), which is what the previous version would do. It sounds like it doesn't matter. So question: should map_partitions be expected to produce the identical result whether or not meta is provided? If yes, it doesn't need to be in this tokenize call at all. If no, then it does, and I don't know of a faster way to get a unique identifier of it.
Also, I reckon output_divisions should probably have been in the tokenize, since that does change the nature of the layer produced.
For building the layers themselves it doesn't matter. But I'd like to ruminate on it for a bit.
Yeah I think my position is as follows:
_meta
only alters the outcome of evaluating typetracers, not graph structurefrom-uproot
io layer, as an example, does not change its output keys when its columns are projected/optimized
Therefore I agree with not tokenizing the meta.
Done. This is final once green, unless we can think of some test that might help.
I'm trying to think of some situations where we could confuse the user with this mechanism, but I think so long as the user doesn't try to manually overwrite keys I think it's fine.
Furthermore if a user is trying to manually overwrite keys they'll probably have found the cache in the first place and can manipulate it as they need to.
Furthermore if a user is trying to manually overwrite keys they'll probably have found the cache in the first place and can manipulate it as they need to.
Agreed, I think the mapping of collection name to graph/meta is natural. I don't even think there's any particular documentation that should go with this, except that maybe the cache size should be configurable? That doesn't need to happen yet.
I'd motion for going ahead and merging this today and getting a release out, then a bunch of wheels can turn on the coffea side of things.
@martindurant can I go ahead and merge/release? You tend to do the honors on these PRs, but I'm happy to turn the cranks.
Go ahead
@lgray , this doesn't actually work, but I thought you would appreciate a glimpse of what I had in mind.