Closed stevengj closed 7 years ago
Regarding the third point, can we do .=
as broadcast!
without worrying overly about loop fusion?
As in A .= B
as broadcast!(x->x, A, B)
as an element-wise copy. Of course, more complex expressions would require more complex loop fusion, but this gives some moderate gain. .+=
and so-on are very low-hanging fruit for at least removing one layer of allocation very simply.
(Hmm... that last one makes me think if .sin=
is good syntax? As in A .sin= B
being like Julia 0.4 A[:] = sin(B[:])
I'm being more than a bit silly here, but it works similarly to the .
prefix version like .sin(x)
rather than suffix sin.(x)
. It even kinda nests: A .exp.sin= x
. Quite ugly though, and messes with scoping/field syntax)
@andyferris, without loop fusion, an in-place .=
is pretty useless. An expression like x .= 4y .+ 5y.^2 .- sin.(y)
will still allocate lots of temporary arrays. If you just want element-wise copy, you can already do A[:] = B
.
@stevengj I agree that it is (almost) useless, except for an alternative copy syntax. It's just "step zero" - it's dirt simple and provides the precedence for "step one" as doing similarly for .+=
and all the other op=
with a prefix .
, which then starts to be useful, for things like x .+= 1
or something to be non-allocating where x
is an Array
. In fact, the number of op=
operators currently defined is relatively small... so it shouldn't be too hard to define them all.
"Step 2" would be the more complex loop fusion or similar, and perhaps generalizations to generic functions.
On the whole, I do have to support what Jeff said in #14544 (comment). Perhaps the cleaner approach is to have some neat syntax for map
/broadcast
(and hopefully where loop fusion is clear to the user or compiler). I dunno.
@andyferris, that discussion is out of date. We now do have a neat syntax for broadcast
, and the possibility for loop fusion is now exposed to the compiler at the syntax level. And the since that syntax is .
, generalized dot operators now make a lot more sense.
@stevengj Sorry, I missed the last week or so of #8450 (it's really hard to keep up with all the threads!).
In any case, the progress seems very cool! The parser-based loop fusion (e.g. your #8450 (comment) seems like a great option to me. Any more complex things can still be done with loops, comprehensions, map
and broadcast
explicitly.
A @fuse
macro can be more safe than making the loop fusion at parser level, can't it?
What's the advantage? We already have that Devectorize.jl
and I don't think It will be more flexible or give you more control.
I'm only worry about making loop fusion everywhere. "The caller would be responsible for avoiding function calls with problematic side effects" but... How can the user use the point notation and avoid the fusion at the same time? Also I think that @fuse
will be more explicit to write and to read in the code. @fuse
can do the same proposed loop fusion, but It can also check the functions to be called (i.e. annotated pure functions). The last isn't possible at syntax level.
The caller would be responsible for avoiding function calls with problematic side effects
I think this is not an issue for the majority of cases.
How can the user use the point notation and avoid the fusion at the same time?
Split the line
but It can also check the functions to be called (i.e. annotated pure functions).
This is impossible. Or at least it won't be better at doing that.
@diegozea, macros operate at the syntax level, without knowing types, so they can't check pureness. Same for the proposed f.(args...)
fusing. (In practice, though, vectorized operations are virtually never used in cases with side effects, much less side effects that would be effected by fusion. And if f.(args...)
is defined as always fusing, then you can reliably know what to expect.)
Think of f.(args...)
as a much more compact and discoverable syntax for @fuse
. (Once we deprecate e.g. sin(x)
in favor of sin.(x)
etcetera, people doing Matlab-like computations will end up using fused loops automatically in many case due to the deprecation warning, whereas they might never learn about @fuse
.)
Another way of saying it is that fusing the loops is a good idea in the vast majority of cases. Not wanting to fuse is the rare exception. That makes it sensible to make fusing the default, and in the rare cases where you don't want to fuse you can either split the line or just call broadcast
explicitly.
(Note that this whole discussion was not even possible before the .
notation. If you write f(g(x))
in 0.4, there is no practical generic way to look "inside" f
and g
and discover that they are broadcast
operations that ought to be fused, whereas f.(g.(x))
makes the user's intent clear at the syntax level, enabling a syntax-level fusing transformation.)
Sorry, I meant parse not syntax level in my last sentence. However I believed that a macro could access a function metadata.
However I believed that a macro could access a function metadata.
No it can't. For a start, it doesn't even have any idea about the binding. It is allowed to do sin = cos; sin(1)
(or a much more reasonable form of this). @pure
is also per method and you can't check that without knowing all the type info.
@diegozea, macros are called at the parse level, which is what we mean by the "syntax" level here. i.e. at the point the macro is called, all that is known is the abstract syntax tree (AST); there is no information about types or bindings.
While fusion by default sounds like a good idea, I find it a bit unsettling that
y = g.(f.(x))
and
a = f.(x)
y = g.(a)
might give different results. Is that just me?
Not different results, just different ways of returning the same result. That's fine IMHO. In the long-term, the compiler might become smarter and detect more complex cases.
@nalimilan You are assuming pure f and g.
FWIW, I suspect most functions people are vectorizing are pure.
Maybe we need to do some more exploration around function traits (purity, guaranteed return types, boolean, etc.) and how they could be incorporated into some of these murkier discussions (vectorization, return types, etc.). It seems like having some explicit declarations around function traits would avoid the need to rely on inference or the compiler.
I'm worried that overly complex rules for defining how loop fusion works will just become too confusing for users. The suggestion that A .= B .+ C ...
becomes syntactic sugar for map
/broadcast
means that, once we are all used to it, we will easily be able to reason about code we see and know how to write a simple fused loop expression for vectors of data.
If it is a simple parsing-level rule, then the differences in @martinholters's comment will be obvious. If it is compile-time magic, we will be spending a lot of time trying to figure out if the compiler is really doing what we want it to do.
But coming back to nested .
functions/operators, would we be able to avoid allocation when matrix multiplication is in the middle, e.g. v1 .= v2 .+ m1*v3
(where m3
is a matrix, and v*
are vectors)? Correct me if I'm wrong, but isn't this BLAS's daxpy
? We would want that to be non-allocating, so hopefully whatever syntax we come up with would allow for this kind of thing (where the order of multiplication and additions for the matrix multiplication is cache-optimized as-in BLAS, not direct as-in map
and broadcast
).
@andyferris Yes, the hope is that the parser level transformation can cover most use case with a well defined schematics. It is in principle possible to add support for more complicated construct (I very briefly talked about this with @andreasnoack ) but I personally feel like it is hard to come up with a syntax that can cover all the cases besides what can be currently achieved with broadcast
(or a mutating version of it).
Thinking loudly, maybe it can be achieved by having a lazy array type (so the computation is done on the fly with in boardcast
)? That would be an orthogonal change though. It'll also be hard to recognize that and call BLAS
functions.
@andyferris, the whole point is that the proposed loop fusion becomes a simple parsing-level guarantee, not compile-time at all. If you see f.(g.(x))
, then you know that the loops are always fused into a single broadcast
, regardless of the bindings of f
, g
, or x
. This allows you to reason simply about the code consistently. It is indeed just syntactic sugar.
This is very different from a compile-time optimization that may or may not occur.
@stevengj Yes, I understand and agree completely (my first two paragraphs were directed at @quinnj).
the whole point is that the proposed loop fusion becomes a simple parsing-level guarantee,
To my taste, the keyword is "guarantee". I really do like it.
I was more thinking along the lines of what @yuyichao was thinking "loudly". If Matrix-Vector multiplication was lazy (returned an iterable over i
for sum(M[i,:] .* v)
or similar) then it would "just work" with no temporaries given a combination of your suggested changes to the parser and introducing the lazy multiplication type (and thus no "compile-time optimizations" are necessary beyond what already exists in Julia). It would be interesting to compare performance to daxpy
.
(of course, when you say "compile-time" optimization I interpret that as changes to compilation after lowering, not changes to definitions in Base.LinAlg
.)
A similar thing for Matrix-Matrix multiplication is much, much harder. Although we could have arbitrarily clever iterators, they might or might not be not be the correct thing to reimplement a somewhat efficient dgemm
in native Julia, or to call out to it "automagically". But this is definitely something worth considering for the future, and for the development of syntax, because superfluous temporaries after multiplying matrices (along with adding them, etc, which hopefully will be fixed in this roadmap) is IMHO currently one of Julia's numerical performance bottlenecks when working with very large matrices/tensors/etc. (On that note, does an interface for dgemm!
currently exist for the memory-constrained users that want to extract the last bit of efficiency out of Julia?)
@stevengj: are you planning on tackling this or should we figure out who else can tackle it?
I'm planning on tackling the loop fusion. The other parts require someone to improve the type computation in broadcast
(similar to #16622), and I was hoping someone else would tackle that. 😊
If we don't implement "syntactic fusing" for 0.5 (status, @stevengj?) and plan on implementing it in a future version, we should amply document that this will change in the future and note that people should only use nested broadcasting in cases where such a transformation would not change the meaning.
Otherwise the only change above that seems to be slated to make it into 0.5 is the improvement of the output type computation.
We will only do #4883 for 0.5. That issue is part of the milestone, so moving this to 0.6
Sorry, I've been traveling for multiple weeks; just got back yesterday.
We missed you at JuliaCon.
I wonder whether rather than just deprecating log(a::Array)
and friends (#17302), a better use of the fact that #17300 frees up this syntax wouldn't be to make it syntactic sugar for calling a vector math library, leveraging the unified call syntax from @rprechelt's vectorize.jl
. Wouldn't it be a bit silly to have to write @vectorize log.(a::Array)
to `re-vectorize' the call?
I suppose the main drawback would be that it would make the log.(a:Array) syntax less discoverable, but I submit that it's even less likely that a Matlab convert would discover the @vectorize
macro on their own, leaving a ~7x speed boost on the table.
@s-broda If there are better (faster) implementations for certain operations, one can always add a specialized method, like broadcast(::typeof(log), a::Array{Float64})=...
to use that for log.(A)
.
@martinholters Fair enough. Is there any advantage to special casing it there versus defining a new method for log(a::Array) that I'm missing?
I'm not sure that this is simply a faster implementation, as there may be a 1 or 2 ULP difference between the the current implementation and using a vector math library. That's why I thought that having a special (convenient) syntax that reflects the difference in semantics might be useful.
@s-broda, the plan is for log(a::Array)
to go away (or at least to be deprecated quasi-permanently for the benefit of users coming from other languages). Besides the advantage of a completely general, automatic vectorizatino syntax, it's extremely useful to expose the user's intention of an element-wise broadcast at the syntax level, with log.(a)
, because that enables loop fusion, in-place operations, and other transformations that are very difficult to do at later compilation stages if the compiler has to figure out things on its own. And there is no real disadvantage of special-casing broadcast
vs special-casing log
.
@stevengj I think there is a misunderstanding here. I couldn't agree more about the usefulness of the sin.(a) syntax, for precisely the reasons that you mention - indeed it's the one feature that I've been crossing my fingers would make it into 0.5, and I couldn't be more thrilled that you tackled it so quickly. My point was rather the opposite: namely, that because this effectively frees up the sin(a::Array) syntax, the now redundant sin(a::Array) could now be made syntactic sugar for calling into Accelerate/Yeppp!/VML in cases where no loop fusion is required. This, too, could thus happen at the syntax level, rather than relying on the compiler or calling Accelerate/Yeppp!/VML explicitly.
I suppose I should have raised this on the mailing list rather than here, sorry.
@s-broda, that's not syntactic sugar, that is just an ordinary method, and requires nothing new in the language; you'll still be able to define foo(a::Array)
methods if you want, regardless of the foo.(a)
support.
@s-broda And that could be done easily in a package.
I would like to argue that once we stop supporting sin(a::Array)
, it should give an error, period (with an error message that tells you to use sin.(a)
, if possible).
Just because we stop supporting it doesn't mean that people will stop trying to use it, and we want them to get the error message to know that is not the intended way. Also, I really don't think that loading a package should change the behavior of code that doesn't use it.
Agreed, the standard "type piracy" guideline is that a package should not extend base methods except on types defined by that package.
Funnily enough, it turns out that the "more speculative" 0.6 proposals were actually easier to implement. Or more fun, at least, since they don't affect backwards-compatibility much.
Life is so breezy when you don't have to worry about breaking people's code. Those were the days!
I guess deprecating the old behaviours is something that should still wait.
@ViralBShah, yeah, it's way too late in the 0.5 cycle for a massive deprecation.
It's pretty easy to write hijackable versions of broadcast. This would allow using the loop fusion mechanism to fuse loops for any function that would benefit from loop fusion, like filter
or mapslices
. I think? all that would be required is using the two functions below instead of the regular versions when parsing the dot syntax.
function maybe_broadcast(args...; _f = broadcast, kwargs...)
_f(args...; kwargs...)
end
function maybe_broadcast!(args...; _f = broadcast!, kwargs...)
_f(args...; kwargs...)
end
How exactly would filter(f, g.(args...))
be able to use loop fusion? It seems like the parser would have to know about the filter
function.
Maybe I'm missing something. Wouldn't
a.(b.(c.(A)), _f = filter)
turn into
maybe_broadcast(x -> a(b(c(x))), A, _f = filter)
turn into
filter(x -> a(b(c(x))), A)
?
Edit: Is the issue that it would turn into
maybe_broadcast((x, f) -> a(b(c(x)), _f = filter), A, filter)
@bramtayl, no, it would turn into maybe_broadcast(x -> a(b(c(x)); _f = filter), A)
. Keyword arguments get passed to the respective function, not to broadcast
.
It seems like we'd need a new syntax, like filter..(a.(b.(c(A)))
.
darn. well that syntax seems nice?
Don't take my offhand syntax suggestions too seriously! It would take a while and a lot of thought to hash out what the implications and semantics would be, and to decide whether a new syntax is really worth it vs. just typing filter(x -> a(b(c(x))), A)
. See #8450 for how long it took to settle on f.(args...)
.
I feel like maybe we should have a "vectorization" label to group issues and PRs related to this stuff?
Now that #15032 is merged, here are the main remaining steps discussed in #8450, roughly in order that they should be implemented:
broadcast
to be at least as good asmap
(#4883). The trickest part is the ongoing discussion of what to do in the empty-array case (see #11034).@vectorized
functions likesin(x)
in favor ofsin.(x)
. #17302a .+ b
asbroadcast(+, a, b)
. (See also https://github.com/JuliaLang/julia/pull/6929#issuecomment-44099346, #14544, and #17393.) Existing overloaded functions.+(a, b) = ...
can be deprecated in favor of overloadingbroadcast(::typeof(+), a, b)
. #17623More speculative proposals
probably for the 0.6 timeframe(suggested by @yuyichao):f.(args...)
calls as "fusing" broadcast operations at the syntax level. For example,sin.(x .+ cos.(x .^ sum(x.^2)))
would turn (injulia-syntax.scm
) intobroadcast((x, _s_) -> sin(x + cos(x^_s_)), x, sum(broacast(^, x, 2)))
. Notice that thesum
function (or any non-dot call) would be a "fusion boundary." The caller would be responsible for not usingf.(args...)
in cases where fusion would screw up side effects. #17300x .= ...
as "fusing" calls tobroadcast!
, e.g.x .= sin.(x .+ y)
would act in-place with a single loop. Again, this would occur at the syntax level, so the caller would be responsible for avoiding function calls with problematic side effects. (See #7052, but the lack of loop fusion at that time made.+=
much less attractive.) #17510