JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.14k stars 5.43k forks source link

Make use of mapslices consistent throughout Julia #3893

Open johnmyleswhite opened 11 years ago

johnmyleswhite commented 11 years ago

I'd like to propose the radical, breaking change of removing all forms of implicit mapslices calls from Julia. I think they make the language much less consistent and create a situation in which one's expectations are routinely violated about the interface for functions. As an example, the list below shows some functions that effectively implement implicit calls to mapslices:

In contrast, the following similar functions do not support the foo(A, dim) interface at all:

I understand that this change would break a great deal of Matlab compatibility and make the language a little more verbose. But I think the gains in consistency would more than make up for that loss by making the language much less confusing. Removing this shorthand would mean that you wouldn't have to memorize which functions belong to a privileged subset that perform implicit mapslices.

As one example of the memorization required to use the foo(A, dim) interface, you need to memorize that empty tuples passed to min are required to trick min into creating slices. I'd much rather know that mapsplices works the same way for all functions in the language.

bramtayl commented 6 years ago

I have a working prototype which is ready to go, but requires several pending PRs related to type stable tuple operations.

bramtayl commented 6 years ago

However, my prototype is breaking: it has a julienne iterator which can be mapped over (and a combine function for combining the resulting pieces). It is sometimes an order of magnitude faster than the current implementation of mapslices.

mbauman commented 6 years ago

It's not clear to me what is breaking here. We're not talking about changing meanings, just adding features or removing specific methods.

Are there any other possible actions here that are breaking? The only one I can see is:

bramtayl commented 6 years ago

There also is the issue that I still haven't beaten the performance of mapreducedims. I got around this in my version by a special optimization for reducing functions (sum, product, etc.) as well as a function optimization generalization (where you could pass Reduce(+) instead of sum to hook into mapreducedims. If this kind of behavior could also lead to mapreducedims being an unexported optimization rather than an export.

simonbyrne commented 6 years ago

Now that I've had a bit of time to look over it, I really like the JuliennedArrays.jl approach. There is probably some bikeshedding to be done regarding naming and the tuple syntax, but overall it feels like the right direction.

mapreducedims being an unexported optimization rather than an export.

:+1:

JeffBezanson commented 6 years ago

Anything that's done in time and that has enough support can be in 1.0, and if non-breaking 1.x. The milestone is only needed if this is considered essential for 1.0.

StefanKarpinski commented 6 years ago

I think adding a composable way to express reductions with something like slices is a good piece of future design work, but deprecating sum(A, d) seems like not something we really should do.

bjarthur commented 6 years ago

the danger of not addressing the consistency of the mapslices interface now is that the sum(A,d) syntax will become entrenched and might never be changed.

why is there a rush to tag 1.0? i would much rather wait another minor release cycle or two (ie 0.7) for issues like this to get fixed, than to have to wait for a presumably much longer major release (ie 2.0).

timholy commented 6 years ago

I think we should aim to get a more efficient mapslices into Base for 1.0 (just so it's not embarrassing), but I don't think we should get rid of the sum(A, d) interface. Due to cache issues, mapslices will never match the performance of reductions that can be performed while visiting all array elements in storage order.

As for "why the rush?" that's pretty clear: many people are looking forward to not having their code break with each release. It's particularly a major disincentive for industrial adoption, but it's a drain even on us academics. If we wait too long, the world will move on to other technologies that may be less perfect but more predictable. It's time for 1.0.

mbauman commented 6 years ago

@bjarthur Could you expound a little bit on what you don't like about the sum(A, d) API? I think our reduction method tables need to be thoroughly examined, and that's on the slate as part of #20402. Specifically, I think they should be structured as:

const ReductionRegion = Union{Integer, Tuple{Vararg{Integer}}}
reduction(f, A::AbstractArray, d::ReductionRegion)
reduction(f, A::AbstractArray)
reduction(A::AbstractArray, d::ReductionRegion)
reduction(A::AbstractArray)
reduction(f, iter)
reduction(iter)

No ambiguities, and it gives obvious hooks for concrete types to specialize on.

Does that satisfy your desire here? Or are you looking for bigger changes?

bjarthur commented 6 years ago

@timholy i'll respectively disagree with your motivations for rushing. i personally enjoy the improvements in each new release. the pace and scope of such improvements is greater because breaking changes are permitted. we've got Compat and femtocleaner to help out. moreover, the longer we wait, the more mature the tools and libraries will be, and so the less likely that the onslaught of newcomers drawn in by a 1.0 gala party will be unimpressed and leave.

@mbauman my desire to deprecate any(A,d) was to make any(f,A,B,C) easier to implement. if it's truly the case that a slicing operator as you describe above will never be as fast, then i agree it would not make sense.

bramtayl commented 6 years ago

Tada https://github.com/bramtayl/JuliennedArrays.jl is published. Sometimes an order of magnitude faster than mapslices and much more flexible.

tpapp commented 5 years ago

I think the the introduction of eachslice by #29749 allows a potential solution for this issue, by making it work for multiple dimensions (currently one dimension is supported).