Open johnmyleswhite opened 10 years ago
You don't have to memorize the subset now; nothing stops you from applying mapslices
to sum
if you want.
We can promote the mapslices
interface as the Julian way to do things, while still retaining the old behavior in a few functions to ease the Matlab transition (as with sum
) or for performance (as with fft
).
Yes, that's certainly true. But we've previously made many decisions to ensure that there aren't two dialects of Julia that provide two mechanisms for doing the same thing.
I suspect I'll be outvoted on this one very quickly, but wanted to raise the issue for discussion.
I think using a DIM keyword consistently would be cleanest. E.G., min(a,dim=2).
@johnmyleswhite I appreciate the pursuit of consistency over function definitions. This is of course a very important aspect in language design.
However, I think what we should do here is to add the support of computation along specified dimension to other functions over time, instead of removing this capability from sum
, max
, etc.
Something like sum(x, 1)
or sum(x, 2)
has been extensively used in various packages. It would be a substantial cost to make changes to all these codes. Also, mapslices(sum, x, dim)
are less performant than sum(x, dim)
, especially when dim > 1, since the memory access pattern of mapslices
is not cache-friendly when dim > 1. (However, I think we can add support of dimensions to reduce
and mapreduce
, which can be implemented in a cache-friendly way though).
That being said, I think we may consider how to improve the consistency of the interface over these functions. (e.g. using dim
keyword, etc).
I would be very happy with either the consistent use of a dim/slice
keyword or any mechanism that ensures that most functions in the language support the foo(A, dim)
interface. As @lindahua suggests, we can add this ability incrementally over time, but it's a non-trivial amount of work because every new function needs to implement this functionality. Because functions like min
don't quite match the standard interface, we will end up creating a new interface for many functions. And because findmin
doesn't return a scalar, we'll have to decide how to allocate the resulting tuples in an array.
As Dahua points out, the major challenge (besides time) is making sure that the resulting interface is still high performance.
I agree with @lindahua that I'd rather add this interface to the stats functions that don't have it rather than remove it from the ones that do. I use these in almost every data analysis script I write, and I find the compact syntax very convenient.
We do need a more friendly syntax for max
and min
. After N threads of discussion on that point, I think the conclusion was to use a keyword dims argument. Just, no one has gotten around to doing it.
Should we have perhaps have a @mapslices
macro that constructs the keyword argument version of these functions from the base case?
I'm very much in favor of mapslices
, or rather, very much against reimplementing the same functionality in any function that could conceivably be applied across a matrix.
The performance difference just doesn't strike me as great enough to embrace such an inconsistency. But I also get that Julia has a strong Matlab background, and not having dim
arguments in certain functions is jarring to many.
While it would be a fairly radical change, despite my Matlab background in principle I think this may be the right thing to do.
But not now. The performance of mapslices
is nowhere close to where it needs to be to become the backbone for such common operations; not just because of the cache-unfriendliness issue @lindahua mentioned (although that indeed is very important), but also because mapslices
still has quite a lot of overhead. Fixing this is yet another thing that relies on breaking the SubArray/cartesian iteration logjam (and I don't think we're ready to incorporate Cartesian.jl into base anytime soon).
Although Julia normally tries to have one way to do things, the cost/benefit ratio of keeping the Matlab-like syntax available (even if we promote mapslices
as the "right" way to do it) seems to me to favor keeping the redundant Matlab-like syntax (costs: some redundant syntax; benefits: compatibility and ease of transition for Matlab users, for a few key functions that are used all over the place).
This is independent of whether we can eliminate the performance penalty of mapslices
. Note also that for functions like sum
we will always be able to do special-case optimizations that will be hard to obtain in a generic mapslices
routine.
Philosophically I totally agree with john, but on the practical side tim's and steve's comments nail it.
I think we should only keep the dim argument for sum
and prod
, and maybe any
and all
, since these happen to have special names (unlike max
, min
, and other functions in general).
In addition to mapslices
, we have reducedim
, which is much more efficient when we know the mapped function is doing the equivalent of reduce
. reducedim
is a silly name since it can actually work on multiple dimensions. It would be nice to name these similarly; maybe mapdims
and reducedims
, or mapslices
and reduceslices
(that last one seems a bit long-winded).
When you say special names, you mean that sum(1, 2)
is a special name for 1 + 2
?
I actually like mapslices
and reduceslices
, but mostly because I'm used to mapslices
. I'd be up for mapdims
and reducedims
as well.
I mean that sum
is a name for x->reduce(+,x)
. In fact some of the reductions are defined this way:
all(A::AbstractArray{Bool}, region) = reducedim(&,A,region,true)
We can make reduceslices
and mapslices
the basic thing, but have such definitions for convenience where possible.
@JeffBezanson The problem is that mean
, std
, and var
are not naturally written with reducedim
since they are not composable as binary operations. So, we are back to the problem that mapslices
has a large performance gap compared to the current dim argument versions of each of these functions. For instance, mapslices
of mean
along the second dimension of a 1000x10x10 array is over 50x slower than directly calling mean
with a dim argument. The gap is even wider if you compare to @lindahua's NumericExtensions version.
I saw #17266 in NEWS and it made me wonder whether the performance of mapslices
is good enough now to reconsider this issue?
As much as it's been improved, we're still not there:
julia> @time mapslices(maximum, A, 2);
0.001857 seconds (4.56 k allocations: 105.172 KB)
julia> @time maximum(A, 2);
0.000269 seconds (11 allocations: 4.422 KB)
BenchmarkTools indicates a 6x difference. It's about a 3:5 ratio of time spent copying data into a temp array and in computing maximum
on that temp array. I'm not quite sure why the maximum
part is so slow, it might merit some investigation by an interested party (willing to try, @bramtayl?). The memory allocation seems high too, which might indicate that it's a simple as forcing julia to specialize the reduction, changing foo(f, ...)
to foo{F}(f::F, ...)
.
Looking at @code_warntype
on mapslices
it is littered with type instabilities so it is not strange it is slow.
Another more extreme example:
julia> @time maximum(a, 2);
0.014243 seconds (14 allocations: 7.630 MB)
julia> @time mapslices(maximum, a, 2);
2.958281 seconds (10.00 M allocations: 205.989 MB, 12.02% gc time)
That's definitely true, though still a bit surprising that it matters so much given that the unsafe_getindex!
and f(Aslice)
do all the real work (and that both of those are "protected" by a function barrier).
One thing that's really backwards is the fact that we accept dims
as a Tuple
but coerce it into an AbstractVector
. Instead, we should be accepting an AbstractVector
and coercing it into a Tuple
. Then all the index manipulations could be made type stable. EDIT: no, even that's not true. We'd need to introduce a function barrier for the first part of mapslices
and generate idx
as a tuple, then pass that to the "real" function.
. We'd need to introduce a function barrier for the first part of mapslices and generate idx as a tuple, then pass that to the "real" function.
@timholy That's exactly what I experimented with for a bit but there is a write to idx
later in the code. I'm sure it can be worked around though. I'm also a bit worried about overspecialization.
Sure, in the "preamble" just stuff :
in every tuple-slot you don't want to rewrite, and fill the ones you want to rewrite with 1. (As you surely know, this is the part that won't be type-stable because dims
is encoded as a tuple-of-values.) Then in the loop, call out to a lispy function that substitutes the elements of the CartesianIndex into the non-Colon slots and returns a new full tuple. That part will be type-stable.
I'm not sure I understand enough to be able to help. But with the new Julia closures scheme, it should be possible to make specialized methods of mapslices
for different functions, correct? I see two possibilities:
1) mapslices
is just inefficient
we should be able to fix it?
2) mapslices
is inefficient because specialized code by function speeds things up
in that case, we can just use specialized code for critical functions, which should be all already written
P.S. if 2 is the case, then the same code specialization strategy could work for a lot of other functional programming functions, like map
, reduce
, broadcast
etc.
P.P.S. we already have the fancy new dot syntax for broadcast. I wonder if there isn't the possibility for a similar fancy syntax for mapslices. Something like:
f.(g.(h.(A, sliceby = 2)))
goes to mapslices(x -> f(a(b(x))), A, 2)
P.P.P.S. filter could also by worked into the syntax:
f.(g.(h.(A)), filter = true)
goes to filter(x -> f(a(b(x))), A)
All that would be required is adding keyword versions of broadcast and broadcast!
How about this for a deal: I'll take the time to write up (in the next couple of days) a blog post on tricks I've learned for writing high performance manipulations of multidimensional indices using tuple magic, and someone else agrees to use the information in that post to make mapslices
awesome.
It would actually be less work for me to just fix mapslices
, but we can't keep going on this way forever.
I tried poking around a bit yesterday but I had trouble wrapping my head around all the new unconventional index stuff so I wasn't sure what I did was safe or not and when I needed to special case to indices
or size
etc.
It should already be safe for unconventional indices, because I updated it already. (See tests here that prove it's working.)
It's just down to stuffing things in the right slots now.
I know it is safe now but I wasn't sure if my modifications were safe :). But yeah, you are right, can just check with the test. Was also a bit curious about the OneTo
type but there was no documentation about it but I guess it is just a way to dispatch on what would be 1:n
for normal arrays?
OneTo: http://docs.julialang.org/en/latest/devdocs/offset-arrays/#using-indices-for-bounds-checks-and-loop-iteration http://docs.julialang.org/en/latest/devdocs/offset-arrays/#custom-abstractunitrange-types
Maybe needs a docstring, even though it's not exported?
I was looking with ?
so a docstring would at least have made me found it :) Me being less lazy would probably also have worked.
That's very reasonable. Laziness is the key underlying most forms of progress :smile:.
@timholy did you ever make a deal and write a blog "on tricks I've learned for writing high performance manipulations of multidimensional indices using tuple magic" ?
i ask because i found a bug in extrema(A,dim)
, and while fixing it took the time to re-evaluate the mapslices, Cartesian, and CartesianRange variants. Cartesian is still the hands down winner by a factor of 2x usually. it uses an evil generated function though, and so I wanted to make sure there's nothing i'm missing in the CartesianRange version that could make it faster. see this gist.
LGTM. One difference between your Cartesian and CartesianRange variants is that you're checking bounds in the latter but not the former---that may explain a portion of the performance gap. Additionally (relevant to both variants), you might consider replacing the branch by an ifelse
and then using @inbounds @simd
around the I
loop. For CartesianRange
, @simd
"pops" the inner dimension and replaces it by its own for loop, thus (partially) circumventing #9080 (in addition to any benefits from vectorization).
And no, I didn't write the blog post. Didn't seem to be all that much interest, possibly because a lot of people seem to already know this stuff.
thanks tim! @inbounds
had a modest but worthwhile improvement on the CartesianRange
variant. combined with @simd
though it's now as fast as Cartesian
. this despite not seeing how to use ifelse
, given that the two branches of the if
clause test different inequalities. i'll modify my PR to use CartesianRange
instead.
forgot to mention that @simd
threw an error with the Cartesian
variant. combined with @nloops
it threw this error:
Base.SimdLoop.SimdError("for loop expected")
Stacktrace:
[1] extrema_cartesian(::Array{Float64,3}, ::Int64) at /Users/arthurb/extrema.jl:16
i tried adding parens around the @nloops
expression so that @simd
would operate on the output, but no joy. are they just not compatible?
You can chain your ifelse
:
BJ = ifelse(AI < BJ[1], (AI, BJ[2]), BJ)
BJ = ifelse(AI > BJ[2], (BJ[1], AI), BJ)
B[J] = BJ
are they just not compatible?
Not sure. You might have to create a special call for the innermost loop.
This would still be a very valuable and welcomed API revamp if anyone wants to tackle it. The first steps would be to figure out classes of functions with consistent APIs and then see if we can refactor the ones that are like mapslices
to be implemented in terms of mapslices
or some slices
mechanism (cc @simonbyrne). Then we can assess if we can bring the other functions in line with the mapslices
pattern.
While changing the mapslices
is a big (but good) chagce, the easy change of adding a dim
keyword to augment or replace the dim
positional arguments could still be done right now.
I don't think the keyword penalty would be too bad in practice since the computational cost is going to be much more within sum
than the call overhead to sum
.
I agree that a keyword is the way to go, though I don't think it should be dim
. In fact we should have 2 different ones:
Any suggestions for names?
Wait do you mean changing a positional argument to a keyword argument with a new name in tens of functions? That doesn't make sense to me at all.... To maintain compatibility, it might make sense to have a positional dims argument for all of these functions deprecate to calling the function on a Slices
iterator. The slices iterator could specialize on the function being sliced if its necessary to maintain performance. As far as naming goes, something like SliceOver(A, dims)
, where dims is a list of the dimensions to be iterated over (and colons in all the other slots).
For better or worse, a lot of people use functions like sum
with tiny arrays (also SVector
s), and the keyword overhead worries me. I honestly don't see what problem is being solved here: when you have a function with only 2-3 arguments, positional arguments seem unobjectionable to me.
I wouldn't mind having the keyword arguments for consistency in addition to the positional ones for convenience. But that whole idea could be tabled until the keyword penalty is fixed.
if positional dims arguments were removed, then it would be easy to make predicate methods for any
and all
like that which already exists for map
. see https://github.com/JuliaLang/julia/issues/20181
We could keep the positional argument forms until we have fast keyword arguments.
On Thu, Aug 3, 2017 at 12:48 PM Steven G. Johnson notifications@github.com wrote:
For better or worse, a lot of people use functions like sum with tiny arrays (also SVectors), and the keyword overhead worries me. I honestly don't see what problem is being solved here: when you have a function with only 2-3 arguments, positional arguments seem unobjectionable to me.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaLang/julia/issues/3893#issuecomment-320071448, or mute the thread https://github.com/notifications/unsubscribe-auth/ABnRaSmuDW5hWHMTkqeYkKHF1umKOwOtks5sUiQGgaJpZM4A3NpN .
IIRC, part of the reason min
and minimum
had to be separate functions was the ambiguity of min(A, d)
: do you want to take the minimum of each slice of A
reducing the d
dimension or do you want to take the minimum of each element of A
with d
? If we used a dim
keyword then that wouldn't be an issue anymore. Not to reopen old wounds, but the min
versus minimum
thing still trips me up somewhat regularly. Of course, there's a semantic correctness argument to be made that min
and minimum
are simply different functions – the former the operator (like +
) and the latter the reducer (like sum
), but I suspect I'm not alone here.
there's a semantic correctness argument to be made that min and minimum are simply different functions – the former the operator (like +) and the latter the reducer (like sum)
This is the position I take. Maybe a renaming, like minof
, reducemin
, minreduce
, ... ?
There might be an argument for eliminating separate functions just for reductions of existing functions.
That argument would fail pretty badly once it got to +
and sum
– which is pretty much the starting gate, so I'm not sure that's got legs.
I propose moving this to 2.0. It doesn't look like we'll have time to redo the mapping-over-slices infrastructure.
Is there a reason this can't be in before 2.0, if it gets implemented?
I'd like to propose the radical, breaking change of removing all forms of implicit mapslices calls from Julia. I think they make the language much less consistent and create a situation in which one's expectations are routinely violated about the interface for functions. As an example, the list below shows some functions that effectively implement implicit calls to
mapslices
:In contrast, the following similar functions do not support the
foo(A, dim)
interface at all:I understand that this change would break a great deal of Matlab compatibility and make the language a little more verbose. But I think the gains in consistency would more than make up for that loss by making the language much less confusing. Removing this shorthand would mean that you wouldn't have to memorize which functions belong to a privileged subset that perform implicit mapslices.
As one example of the memorization required to use the
foo(A, dim
) interface, you need to memorize that empty tuples passed tomin
are required to trickmin
into creating slices. I'd much rather know thatmapsplices
works the same way for all functions in the language.