Closed robertmaxton42 closed 6 years ago
The thing is, if the axis size is large enough, transposition makes a big difference in performance, and currently you cannot avoid it. But if the axis is only several elements long, it may be possible for non-sequential memory access to be faster than transposition + sequential memory access. Perhaps I should add a no_transpose
option or something like that, but that's a different issue.
(Sorry to keep bugging you when work is busy; just let me know when you have a moment)
I will actually be travelling until the middle of September, so my responses will be infrequent, but don't let that stop you, file an issue if you have a problem, I'll get to them eventually :)
The thing is, if the axis size is large enough, transposition makes a big difference in performance, and currently you cannot avoid it.
Yeah, I found that out the hard way recently :P. But no, even under that circumstance sequential Scans have an extra call to transpose - because rather than going from (a, b, c) -> (a, c, b) -> (a, b, c)
and then (a, b, c) -> (c, b, a) -> (a, b,c)
, you can go directly from (a, c, b) -> (b, c, a)
and then out from there.
... Well, assuming that all permutations of axes take equal time, which I just realized isn't necessarily the case. Whoops.
(Sorry to keep bugging you when work is busy; just let me know when you have a moment)
Scan
behaves not-intuitively relative to its documentation, in that it does not specify what is meant by scanning over multiple axes at once. I was expectingaxis=[0,1]
to be equivalent to aScan
over axis 0, followed by aScan
over axis 1, with the benefit of saving an intermediate, internal call toTranspose
; but upon a review of the code this is not the case, as aScan
instead transposes so that the listed axes are adjacent and innermost, flattens them, and scans over the new, merged "axis", which rather surprised me when I saw it in action. It would be helpful if this was mentioned explicitly in the documentation.Thanks for all the work!