JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.61k stars 5.48k forks source link

Matrix Multiplication API #23919

Closed jebej closed 5 years ago

jebej commented 7 years ago

Currently, there is the following line in the sparse matmult code:

https://github.com/JuliaLang/julia/blob/056b374919e11977d5a8d57b446ad1c72f3e6b1d/base/sparse/linalg.jl#L94-L95

I am assuming that this means we want to have the more general A_mul_B*!(α,A,B,β,C) = αAB + βC methods which overwrite C (the BLAS gemm API) for dense arrays. Is that still the case? (It also seems possible to keep both APIs, i.e. keep the A_mul_B*!(C,A,B) methods, which would simply be defined as A_mul_B*!(C,A,B) = A_mul_B*!(1,A,B,0,C).)

I would personally like to have the gemm API defined for all array types (this has been expressed elsewhere). Implementing these methods for dense arrays seems fairly simple, since they would just directly call gemm et al. The sparse case is already implemented. The only real modification would be to the pure julia generic matmult, which does not accept α and β arguments.

This would lead to generic code that works with any type of array/number. I currently have a simple implementation of expm (after having done the modification to _generic_matmatmult!) which works with bigfloats and sparse arrays.

Sacha0 commented 7 years ago

Ref. #9930, #20053, #23552. Best!

jebej commented 7 years ago

Thanks for the references. I suppose this issue has more to do with adding the gemm-style methods than an API revamp, but it can be closed if we think it still is too similar to #9930.

jebej commented 7 years ago

As a starting point, would there be support for having _generic_matmatmul! have the gemm API? It's a fairly simple change, and purely additive/nonbreaking, since the current method would simply be implemented by having α=1 and β=0. I can make the PR. I'd probably go similarly to this version (in that version I cut all the transpose stuff because I didn't need it, I wouldn't do that here).

andreasnoack commented 7 years ago

Yes. That would be a good start. We need to consider the ordering of the arguments though. Originally, I thought it was more natural to follow the BLAS ordering but we fairly consistent about having output arguments first which is also the case for the current three-argument A_mul_B!. Furthermore and as you have already pointed out, the three-argument version would correspond to the five-argument version with α=1 and β=0 and default value arguments are last. Of course, we don't necessarily have to use the default value syntax for this but it would make sense to use it here.

StefanKarpinski commented 7 years ago

Why not just introduce a generic gemm function?

jebej commented 7 years ago

Yes. That would be a good start. We need to consider the ordering of the arguments though. Originally, I thought it was more natural to follow the BLAS ordering but we fairly consistent about having output arguments first which is also the case for the current three-argument A_mul_B!. Furthermore and as you have already pointed out, the three-argument version would correspond to the five-argument version with α=1 and β=0 and default value arguments are last. Of course, we don't necessarily have to use the default value syntax for this but it would make sense to use it here.

Sounds good. We can continue the discussion about the actual argument ordering and method renaming in #9930. This is more about just having the five-argument version available, so I'll keep the current Ax_mul_Bx!(α,A,B,β,C) interface.

Why not just introduce a generic gemm function?

Are you suggesting renaming _generic_matmatmul! to gemm! in addition to the changes above?

jebej commented 7 years ago

To be clearer, I do think we should end up having a single method mul(C,A,B,α=1,β=0), along with lazy transpose/adjoint types to dispatch on, but that will be an other PR.

andreasnoack commented 7 years ago

Why not just introduce a generic gemm function?

I think gemm is a misnomer in Julia. In BLAS, the ge part indicates that the matrices are general, i.e. have no special structure, the first m is multiply and the list m is matrix. In Julia, the ge (general) part is encoded in the signature as is the last m (matrix) so I think we should just call it mul!.

dlfivefifty commented 6 years ago

Is the notation mul!(α, A, B, β, C) from SparseArrays.jl

https://github.com/JuliaLang/julia/blob/160a46704fd1b349b5425f104a4ac8b323ea85af/stdlib/SparseArrays/src/linalg.jl#L32

"finalised" as the official syntax? And this would be in addition to mul!(C, A, B), lmul!(A, B), and rmul!(A,B)?

I'm not too big a fan of having A, B, and C in different orders.

andreasnoack commented 6 years ago

Is the notation mul!(α, A, B, β, C) from SparseArrays.jl "finalised" as the official syntax?

I'd say no. Originally, I liked the idea of following BLAS (and the order also matched how this is usually written mathematically) but now I think it makes sense to just add the scaling arguments as optional fourth and fifth arguments.

dlfivefifty commented 6 years ago

So just to clarify, you'd like optional arguments in the sense

function mul!(C, A, B, α=1, β=0)
 ...
end

The other option would be optional keyword arguments

function mul!(C, A, B; α=1, β=0)
...
end

but I'm not sure people are too happy with unicode.

andreasnoack commented 6 years ago

I'm very happy with unicode but it is true that we try always have an ascii option and it would be possible here. The names α and β are also not super intuitive unless you know BLAS so I think using positional arguments the better solution here.

perrutquist commented 6 years ago

In my opinion, a more logical nomenclature would be to let muladd!(A, B, C, α=1, β=1) map to the various BLAS routines that do multiplication and addition. (gemm as above, but also e.g. axpy when A is a scalar.)

The mul! function could then have an interface like mul!(Y, A, B, ...) taking an arbitrary number of arguments (just like * does) as long as all intermediate results can be stored in Y. (An optional kwarg could specify the order of multiplication, with a reasonable default)

mul!(Y, A::AbstractVecOrMat, B:AbstractVecOrMat, α::Number) would have the default implementation muladd!(A, B, Y, α=α, β=0), as would the other permutations of two matrix/vectors and a scalar.

matthieugomez commented 6 years ago

Another vote to have mul!(C, A, B, α, β) defined for dense matrices. This would allow to write generic code for dense and sparse matrices. I wanted to define such a function in my non linear least squares package but I guess this is type piracy.

dmbates commented 6 years ago

I also have been tempted to write mul!(C, A, B, α, β) methods for the MixedModels package and engage in a bit of type piracy, but it would be much better if such methods were in the LinearAlgebra package. Having methods for a muladd! generic for this operation would be fine with me too.

simonbyrne commented 6 years ago

I'm in favour, though I think it should probably have a different name than just mul!. muladd! seems reasonable, but am certainly open to suggestions.

simonbyrne commented 6 years ago

Maybe mulinc! for multiply-increment?

YingboMa commented 6 years ago

Maybe we can have something like addmul!(C, A, B, α=1, β=1)?

StefanKarpinski commented 6 years ago

Isn’t this a form of muladd!? Or is the idea behind calling it addmul! that it mutates the add argument rather than the multiply argument? Would one ever mutate the multiply argument?

simonbyrne commented 6 years ago

Note that we do mutate non-first elements in some cases, e.g. lmul! and ldiv!, so we could do them in the usual "muladd" order (i.e. muladd!(A,B,C)). The question is what order should α and β go? One option would be to make the keyword arguments?

tkf commented 6 years ago

Wouldn't it be nice if you leave an option to the implementers to dispatch on the types of the scalars α and β? It's easy to add sugars for end-users.

andreasnoack commented 6 years ago

I thought we already settled on mul!(C, A, B, α, β) with default values for α, β. We are using this version in https://github.com/JuliaLang/julia/blob/b8ca1a499ff4044b9cb1ba3881d8c6fbb1f3c03b/stdlib/SparseArrays/src/linalg.jl#L32-L50. I think some packages are also using this form but I don't remember which on top of my head.

tkf commented 6 years ago

Thanks! It would be nice if that's documented.

simonbyrne commented 6 years ago

I thought we already settled on mul!(C, A, B, α, β) with default values for α, β.

SparseArrays uses it, but I don't recall it being discussed anywhere.

dmbates commented 6 years ago

In some ways the muladd! name is more natural because it is a multiplication followed by an addition. However, the default values of α and β, muladd!(C, A, B, α=1, β=0) (note that the default for β is zero, not one), turn it back into mul!(C, A, B).

It seems to be a case of pedantry vs consistency whether to call this mul! or muladd! and I think the case of the existing method in SparseArrays would argue for mul!. I find myself in the curious case of arguing for consistency despite my favorite Oscar Wilde quote, "Consistency is the last refuge of the unimaginative."

simonbyrne commented 6 years ago

In some ways the muladd! name is more natural because it is a multiplication followed by an addition. However, the default values of α and β, muladd!(C, A, B, α=1, β=0) (note that the default for β is zero, not one), turn it back into mul!(C, A, B).

There is an interesting exception to this when C contains Infs or NaNs: theoretically, if β==0, the result should still be NaNs. This doesn't happen in practice because BLAS and our sparse matrix code explicitly check for β==0 then replace it with zeros.

StefanKarpinski commented 6 years ago

You could consider that having defaults of α=true, β=false since true and false are "strong" 1 and 0 respectively, in the sense that true*x is always x and false*x is always zero(x).

dlfivefifty commented 6 years ago

lmul! should also have that exceptional behaviour: https://github.com/JuliaLang/julia/issues/28972

simonbyrne commented 6 years ago

true and false are "strong" 1 and 0 respectively, in the sense that true*x is always x and false*x is always zero(x).

I didn't know that!:

julia> false*NaN
0.0
dlfivefifty commented 6 years ago

FWIW, I'm pretty happy with the readability of LazyArrays.jl syntax for this operation:

y .= α .* Mul(A,x) .+ β .* y

Behind the scenes it lowers to mul!(y, A, x, α, β), for BLAS-compatible arrays (banded and strided).

StefanKarpinski commented 6 years ago

I didn't know that!

It's part of what makes im = Complex(false, true) work.

andreasnoack commented 6 years ago

SparseArrays uses it, but I don't recall it being discussed anywhere.

It was discussed above in https://github.com/JuliaLang/julia/issues/23919#issuecomment-365463941 and implemented in https://github.com/JuliaLang/julia/pull/26117 without any objections. We don't have the α,β versions in the dense case so the only place in this repo where a decision would have an immediate effect would be SparseArrays.

tkf commented 6 years ago

What about LinearAlgebra.BLAS.gemm!? Shouldn't it be wrapped as 5-ary mul!, too?

andreasnoack commented 6 years ago

It should but nobody has done it yet. There are many methods in matmul.jl.

simonbyrne commented 6 years ago

It was discussed above in #23919 (comment) and implemented in #26117 without any objections.

Well, consider this my objection. I would prefer a different name.

jebej commented 6 years ago

Why would it be a different name? Both in the dense and sparse case, the basic algorithm does both the multiplication and addition.

If we give those functions different names, we'll have mul!(C,A,B) = dgemm(C,A,B,1,0) and muladd!(C,A,B,α, β) = dgemm(C,A,B,α, β).

The only advantage I see is if we actually split the methods and save an if β==0 call in the C = A*B case.

tkf commented 6 years ago

FYI, I started working on it in #29634 to add the interface to matmul.jl. I'm hoping to finish it by the time the name and signature are decided :)

tkf commented 6 years ago

An advantage of muladd! would be that we can have ternary muladd!(A, B, C) (or muladd!(C, A, B)?) with the default α = β = true (as mentioned in the original suggestion https://github.com/JuliaLang/julia/issues/23919#issuecomment-402953987). The method muladd!(A, B, C) is similar to muladd for Numbers so I guess it's more natural name especially if you already know muladd.

@andreasnoack It seems your earlier discussion is about method signature and preferring positional arguments over keyword arguments, not about method name. Do you have objection for the name muladd!? (The existence of 5-ary mul! in SparseArrays might be one, but defining the backward-compatible wrapper is not hard.)

andreasnoack commented 6 years ago

Having both mul! and muladd! seems redundant when the former is just the latter with default values for α and β. Furthermore, the add part has been canonicalized by BLAS. If we could come up with a credible generic linear algebra application for muladd!, I'd like to hear about it but otherwise I'd prefer to avoid the redundancy.

Also, I'd strongly prefer that we keep the strong-zero property of false separate from the discussion of mul!. IMO any zero value of β should be strong as it is in BLAS and as it is in the current five argument mul! methods. I.e. this behavior should be a consequence of mul! and not the type of β. The alternative would diffucult to work with. E.g. mul!(Matrix{Float64}, Matrix{Float64}, Matrix{Float64}, 1.0, 0.0) ~could~couldn't use BLAS.

simonbyrne commented 6 years ago

We can't change what BLAS does, but requiring strong zero behaviour for floats does mean that every implementation will need a branch to check for zero.

tkf commented 6 years ago

If we could come up with a credible generic linear algebra application for muladd!

@andreasnoack By this, I suppose you mean "application for three-argument muladd!" since otherwise you wouldn't agree to include five-argument mul!?

But I still can come up with an example where muladd!(A, B, C) is useful. For example, if you want to construct a "small-world" network, it's useful to have a "lazy" summation of a banded matrix and a sparse matrix. You can then write something like:

A :: SparseMatrixCSC
B :: BandedMatrix
x :: Vector  # input
y :: Vector  # output

# Compute `y .= (A .+ B) * x` efficiently:
fill!(y, 0)
muladd!(x, A, y)  # y .+= A * x
muladd!(x, B, y)  # y .+= B * x

But I don't mind manually writing trues there since I can simply wrap it for my usage. Having the five-argument function as a stable documented API is the most important goal here.

Going back to the point:

Having both mul! and muladd! seems redundant when the former is just the latter with default values for α and β.

But we have some * implemented in terms of mul! with the "default value" of the output array initialized appropriately. I think there are may such "shortcut" examples in Base and standard libraries? I think it makes sense to have both mul! and muladd! even though mul! is just a shortcut of muladd!.

I'd strongly prefer that we keep the strong-zero property of false separate from the discussion of mul!

I agree that it would be constructive to focus discussing the name of five-argument version of the multiply-add first (mul! vs muladd!).

andreasnoack commented 6 years ago

I didn't do a good job when I asked for a generic use case where you needed muladd to work generically across matrices and numbers. The number version would be muladd without the exclamation mark so what I asked didn't really make sense.

Your example could just be written as

mul!(y, A, x, 1, 1)
mul!(y, B, x, 1, 1)

so I still don't see the need for muladd!. Is it just that you think this case is so common that writing 1, 1 is too verbose?

But we have some * implemented in terms of mul! with the "default value" of the output array initialized appropriately. I think there are may such "shortcut" examples in Base and standard libraries?

I don't get this one. Could you try to elaborate? What are the shortcuts you are talking about here?

simonbyrne commented 6 years ago

so I still don't see the need for muladd!. Is it just that you think this case is so common that writing 1, 1 is too verbose?

I think muladd! is also more descriptive as to what it actually does (though perhaps it should be addmul!).

andreasnoack commented 6 years ago

I don't have a problem with the name muladd!. Primarily, I just don't think we should have to functions for this and secondarily I don't think deprecating mul! in favor of muladd!/addmul! is worth it.

tkf commented 6 years ago

Is it just that you think this case is so common that writing 1, 1 is too verbose?

No. I'm totally fine with calling five argument function as long as it's a public API. I just tried to give an example where I only need three argument version (as I thought that was your request).

What are the shortcuts you are talking about here?

https://github.com/JuliaLang/julia/blob/f068f21d6099632bd5543ad065d5de96943c9181/stdlib/LinearAlgebra/src/matmul.jl#L140-L143

I think * defined here can be considered a shortcut of mul!. It's "just" mul! with a default value. So, why don't let mul! be a muladd!/addmul! with default values?

There are rmul! and lmul! defined as similar "shortcuts" as well:

https://github.com/JuliaLang/julia/blob/f068f21d6099632bd5543ad065d5de96943c9181/stdlib/LinearAlgebra/src/triangular.jl#L478-L479

deprecating mul!

I thought the discussion was about adding a new interface or not. If we need to deprecate mul! to add a new API, I don't think it's worth it.

simonbyrne commented 6 years ago

The main arguments I can think of are:

andreasnoack commented 6 years ago

I think * defined here can be considered a shortcut of mul!. It's "just" mul! with a default value. So, why don't let mul! be a muladd!/addmul! with default values?

Because * is the default way of multiplying matrices and how most users would do it. In comparison, muladd! wouldn't be anywhere close to * in usage. Furthermore, it's even an existing operator whereas muladd!/addmul! would be a new function.

In don't think rmul! and lmul! fit this pattern because they generally aren't default value versions of out-of-place mul! methods.

Simon summarizes the benefits nicely in the post right above. The question is if the benefits are large enough to justify either an extra function of a renaming (which means deprecation of mul!). It's where we disagree. I don't think it's worth it.

tkf commented 6 years ago

When you say that renaming isn't worth it, did you take it into account that the API is not completely public? By that, I mean it's not in Julia's documentation.

I know LazyArrays.jl (and other packages?) already uses it so blindly following the semver wouldn't be good. But still, it's not as public as other functions.

andreasnoack commented 6 years ago

mul! is exported from LinearAlgebra and broadly used so we'd definitely have to deprecate it at this point. It's a shame we didn't have this discussion when A_mul_B! became mul! or at least before 0.7 because it would have been a much better time to rename the function.

fredrikekre commented 6 years ago

How about using mul! for now and update the name for LinearAlgebra v2.0 when we can update stdlibs separately?