0.6: vec*mat throws "Cannot left-multiply a matrix by a vector" even when mat is 1 x n

dlfivefifty commented 7 years ago

The following throws an error, even though a column vector times a 1 x n matrix have consistent shapes.

julia> ones(5)*ones(1,5)
ERROR: DimensionMismatch("Cannot left-multiply a matrix by a vector")
Stacktrace:
 [1] *(::Array{Float64,1}, ::Array{Float64,2}) at ./linalg/rowvector.jl:157

dlfivefifty commented 7 years ago

See discussion https://discourse.julialang.org/t/vector-matrix-in-0-6/1838/6

fredrikekre commented 7 years ago

julia> ones(5) * ones(5)'
5×5 Array{Float64,2}:
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0

dlfivefifty commented 7 years ago

I know, but it doesn't change the fact that vector*ones(1,5) is well-defined mathematically

Sent from my iPhone

On 2 Feb 2017, at 22:23, Fredrik Ekre notifications@github.com wrote:

julia> ones(5) * ones(5)' 5×5 Array{Float64,2}: 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

stevengj commented 7 years ago

It looks like it is a bug introduce in #19670; cc @andyferris

andyferris commented 7 years ago

@stevengj this was intentional on my part, rather than a bug. Perhaps a misfeature - but I'm pretty confident that it isn't.

EDIT: Damn, I meant to stick this in the NEWS.md, but it seems I forgot.

andyferris commented 7 years ago

I'll copy my comments from Discourse:

To me, the question is in reverse - why did this method exist in v0.5?

The (only) reason is it was a way of obtaining the outer product of two vectors. You can now do vector * rowvector for this. In fact, this is still a (specialized) method of *(::AbstractVector, ::AbstractMatrix).

Otherwise vector * matrix doesn't make too much sense (outside of MATLAB/Householder notation, where matrices and vectors are of the same kind of object, which is just plain not true in Julia). You wouldn't impress a math lecturer with writing B = v.A in an assignment/exam, and it would be a misuse of Dirac notation to write |psi> H and expect an operator back. The correct things are B = v.wᵀ and O = |psi><phi|.

Believe me, I pained over this one, but really its not difficult to write ones(5) * ones(5)' for the outer product (in fact, it's one less character :) ). Since the *(A::AbstractVector, B::AbstractMatrix) method was an error if size(B, 1) was not 1, it is implied that the programmer knows for sure that B is a 1xN-shaped Array, which is guaranteed by B::RowVector.

stevengj commented 7 years ago

If we view a vector as a 1-column matrix, then vector * (1-row matrix) makes perfect sense in standard linear algebra..

(I agree that there is some tension between this viewpoint and viewing vectors as living in some kind of abstract finite-dimensional Hilbert space with matrices as linear operators on them. But this seems like a case where we might as well continue to support both viewpoints, since most users will expect it.)

stevengj commented 7 years ago

(This question seems in much the same vein as whether a RowVector should pretend to be a 1xN array. The utilitarian viewpoint—supporting as many operations as could possibly makes sense—rather than a more mathematically restrictive definition, seems to apply here.)

StefanKarpinski commented 7 years ago

This strikes me as a slightly fishy thing to do in the first place, but I guess I can't really see the harm in allowing it.

andyferris commented 7 years ago

This strikes me as a slightly fishy thing to do in the first place, but I guess I can't really see the harm in allowing it.

I worry that "slightly fishy" just leads to worse/less clear code, and prefer to have crystal clear semantics about things. I personally can't see that a user really wants this thing, where they are in this situation where "I have this matrix, but I know it's a single row but not a RowVector, and I can't be bothered typing .*...". The most likely case will be a * b', which works fine.

We don't treat Nx1 matrices as if they are vectors. Here's just one of many, many cases:

julia> diagm(ones(3,1))
ERROR: MethodError: no method matching diagm(::Array{Float64,2})
Closest candidates are:
  diagm(::Union{BitArray{1},BitArray{2}}) at linalg\bitarray.jl:90
  diagm{Tv,Ti}(::SparseMatrixCSC{Tv,Ti}) at sparse\sparsematrix.jl:3571
  diagm{T}(::AbstractArray{T,1}) at linalg\dense.jl:121
  ...

I was aiming for a degree of symmetry here, where we don't treat 1xN arrays as if they were RowVector.

dlfivefifty commented 7 years ago

It's not treating a 1 x N array like a row vector, it's treating a column vector like an N x 1 matrix.

andyferris commented 7 years ago

OK, but what I didn't get is why do this for * but not other linear algebra operations:

julia> svd([1,2,3])
ERROR: MethodError: no method matching svdfact!(::Array{Float64,1}; thin=true)
Closest candidates are:
  svdfact!(::Base.LinAlg.AbstractTriangular{T,S<:AbstractArray{T,2}}) at linalg\triangular.jl:1851 got unsupported keyword argument "thin"
...

julia> svd(reshape([1,2,3],(3,1)))
(
[0.267261; 0.534522; 0.801784],

[3.74166],
[1.0])

We could go and implement this, but where would it end? Following that path would be a development nightmare.

I'm not sure I care enough about this to hold out :) I don't even disagree that we can find a semantic of * such that the method makes perfect sense - I'm just trying to point out that not having this method is consistent with the rest of Julia.

stevengj commented 7 years ago

Because * is one of those cases where the usage is so well-established that it will be more confusing to explain to new users why they can't do it.

andyferris commented 7 years ago

@dlfivefifty if mathematicians would really find the missing method quite suspicious, that is a pretty powerful argument.

Because * is one of those cases where the usage is so well-established that it will be more confusing to explain to new users why they can't do it.

This is also a good argument. We want an easy-to-use language. (EDIT: a well written message would help with such an explanation... you see @dlfivefifty it's all part of my plan to secretly train the next generation of mathematicians in Dirac notation... :) ).

stevengj commented 7 years ago

(I used to like Dirac notation until I started teaching with it and found out how badly most students mis-use it. They mostly seem to think it is just a funny kind of parentheses, and happily write things like "Â|ψ⟩=|Âψ⟩" to the point where it seemed to cause more harm than good.)

mzaffalon commented 7 years ago

(@stevengj That does not seem a good argument to ditch Dirac notation. Maybe the students should start to use it properly. Once getting used, it certainly does help to keep track what a bra and what a ket are. But you probably did not imply that.)

stevengj commented 7 years ago

(I find that the benefits I get from it, in a non-quantum context, are outweighed by the the time it takes to teach people to use it properly, particularly when I'm dealing with physics students who should, in principle, have already learned it.)

stevengj commented 7 years ago

Sorry for the off-topic digression.

In principle, a deprecation message could try to explain this. But my feeling is that trying to explain abstract vector spaces and dual spaces, and how vectors differ from the vector space of 1-column matrices that they are isomorphic to, is too much to deal with in a deprecation message. It will be much less confusing to just allow the operation.

andyferris commented 7 years ago

Haha - that is quite a math lesson to learn from a short string. :)

I'm still a bit disquiet about this. The point was never Dirac notation (which is definitely a confusing set of brackets for newcomers), but the appreciation of vectors, their duals and linear operators. MATLAB always just called all of these arrays of numbers; I would like to take a more (relatively elementary) abstract linear algebra approach.

To disentangle the argument, rather than consider Dirac notation, the corresponding mathematical notation I was aiming for would be this: "Consider vectors v, w, etc and their duals vᵀ, wᵀ, etc, as well as linear operators A, B, etc. The inner product induced by the duals (returning a scalar) is written vᵀ.w and the linear mapping as A.v = w; the dual of this implies vᵀ.Aᵀ = wᵀ. We can compose linear operators as A.B, and we can show the dual map is (A.B)ᵀ = Bᵀ.Aᵀ. Composing scalar-vector multiplication with the inner product means we can define a linear map that takes v and emits u.(wᵀ.v) and we choose to write this linear map as A = u.wᵀ." In the universities I've been associated with, a good student at the end of first-year math should be comfortable with this (plus eigenvalues, inverses, etc).

To me, it seems exceeding inelegant to add to the end of this mathematical notation description "Oh, and by the way, in the special case that an operator A is a mapping from a 1-dimensional vector space to a n-dimensional vector space, we also allow you to write v.A." I feel this is associate one-dimensional vector spaces with scalars (in Julia, that is 1 == [1]). Take again the last sentence in my notation description above

Composing scalar-vector multiplication with the inner product means we can define a linear map that takes v and emits u.(wᵀ.v) and we choose to write this linear map as A = u.wᵀ."

If we replace wᵀ with a linear operator from a N-dimensional vector space to a 1-dimensional vector space (remember Julia says a 1D vector space is distinct from a scalar) then that implies u is an operator from 1-dimensional vector space to an M-dimensional vector space. And since Julia treats vectors and matrices as distinct, that implies that u must be a matrix. That is, we can multiply Mx1 matrices by 1xN matrices - which is fine and already exists, but it doesn't justify the method requested in the OP. IMO, all this follows directly from pre-exisiting Julia choices 1 != [1] (scalars and 1D vector spaces are distinct) and Vector != Matrix (operators and vectors are distinct).

To give a bit of context of where I'm coming from, over time, my views have been quite hardline about this kind of thing. For instance, IMO Base deals with map, broadcast (and thus dot operators), permutedims, reshape, etc, and LinAlg defines + (as opposed to .+), * (as opposed to .*), transpose/.' (as opposed to permutedims), conj(::AbstractArray) (as opposed to conj.(::AbstractArray)) and ctranspose/.' (which is Hermitian conjugation) since all of these derive directly from linear algebra concepts. For instance, this week I found bugs in years-old MATLAB code in production at work because the original programmer was using ' everywhere to reshape arrays of data and it was so unclear it wasn't obvious what was happening. A strong separation of these concepts means we can afford to make a linear algebra system which is crystal clear and makes sense, have data operations on arrays be crystal clear and make sense, and use AbstractArray to implement both. But it takes discipline.

I'm very sorry for my extremely long rant, especially the last paragraph - please don't take it the wrong way as I'm trying to admit here that my views are very strict and that some softening in an implementation which is to be used be very large number of users is quite acceptable in my book. :) But when I read e.g. this comment from Stefan I thought "well, it's only because it makes sense". I agree that this change is painful and it should have been a deprecation not a breaking change (sorry, my bad) but IMO long-term clarity is also important.

I will try to bow out and let you guys figure out what to do now. @alanedelman and @mbauman have also discussed dual vectors, etc amongst themselves and with me quite a bit, so I feel it would be valuable to have their opinions before moving on.

stevengj commented 7 years ago

I agree that any math major could comprehend this. But lots of users of linear algebra are not math majors, even at the first year level. As long as we don't stray into ambiguity or incorrectness, a certain flexibility doesn't seem to hurt.

I'm also a bit concerned that it will be difficult to write a clear deprecation message here. What you want to replace it with really seems to depend on where your 1-column matrix came from.

andyferris commented 7 years ago

Using .* will always work.

dlfivefifty commented 7 years ago

Your viewpoint is also not canonical for all mathematicians. To numerical analysts a column vector is an n x 1 matrix. So we would view Vector as a special type to represent n x 1 matrices. I suspect this is also how engineers think about things.

Abstract algebra is a generalisation of linear algebra. It's concepts do not necessary supersede or invalidate a simpler viewpoint. So I don't think your strict view is any more correct mathematically (just a different set of definitions), and would confuse large set of users.

dlfivefifty commented 7 years ago

This example is consistent with my point:

julia> ones(5,1)+ones(5)
5×1 Array{Float64,2}:
 2.0
 2.0
 2.0
 2.0
 2.0

mbauman commented 7 years ago

Yes, it's true there are multiple self-consistent interpretations here. The goal of #4774 was to decide what kind of universe that we want our multidimensional arrays to live in.

Across many issues, we seem to be moving away from Matlab's trailing singleton dimensions and moving towards a universe where dimensionality is intrinsically very important. This intimitely connects with #14770.

Personally, I like this restriction; to me it is both pedagogically and practically useful. I've had bugs that this would have caught. And I say that with full knowledge that it's in direct opposition with my viewpoint on #14770.

andyferris commented 7 years ago

I completely agree @dlfivefifty that there are multiple entirely consistent conventions for linear algebra, and my one is not the one true canonical viewpoint. I am wondering which are consistent with pre-existing facts and conventions in Julia like 1 != [1] and the fact that !(AbstractVector <: AbstractMatrix) (the latter precisely being Vector is a special case of Matrix).

andyferris commented 7 years ago

This example is consistent with my point:

Ha! I would have made a PR to deprecate that if I had of seen that already. :) (There is no harm in using .+ - in fact I thought the vectorization roadmap would suggest that we do.)

dlfivefifty commented 7 years ago

In any case, they should be consistent. so either also deprecate vec + mat or undeprecate vec*mat.

One way I would be happy with the current setting is if Vector denoted an operator on 0-dimensional arrays. But in that case, the following should work:

julia> v*ones()
ERROR: MethodError: no method matching *(::Array{Int64,1}, ::Array{Float64,0})
Closest candidates are:
  *(::Any, ::Any, ::Any, ::Any...) at operators.jl:281
  *{T}(::Number, ::AbstractArray{T,N} where N) at arraymath.jl:31
  *{T}(::AbstractArray{T,N} where N, ::Number) at arraymath.jl:34
  ...

andyferris commented 7 years ago

At some point in the future, I feel we might start to seriously consider the multi-dimensional (arbitrary tensor contraction) case with 0-dimensional being an obvious case. Among other things, we've been thinking about ways of representing or thinking about "upness" and "downness" of each dimension (vector spaces and their duals) and the method in which * would go about in connecting/contracting over these.

Across many issues, we seem to be moving away from Matlab's trailing singleton dimensions and moving towards a universe where dimensionality is intrinsically very important.

Right, yes in all of the arguments above I am probably asserting a stronger difference between vectors and matrices than in v0.5 and before, and more like where we are heading towards/aspiring for in the future. We might even see deprecating *(::AbstractVector, ::AbstractMatrix) in favor of .* as a case of automatic broadcasting that we don't need/desire anymore, similar to sin(::AbstractVector). For example. the reason we deprecate sin(::AbstractVector) but not conj(::AbstractVector) is because conjuagation of vectors is a valid concept from linear algebra (it just happens to coincide with the elementwise operation). The fact that we also have +, conj, etc for higher-dimensional arrays is (to me) a nod to the fact that such arrays might be seen as high-rank tensors, and while we don't yet support generic tensor contraction in Base, we do recognize that tensors are elements of a vector space.

(edit: curiously, this leads to an argument that, from the linear algebra point of view as well as many interfaces like linear indexing, AbstractMatrix <: AbstractVector`... a matrix more-or-less supports everything a vector can do, as well as more).

stevengj commented 7 years ago

You can can have a complex vector space without a multiplication algebra, so I don't think that defining +, conj, etcetera for higher-dimensional arrays implies that they are viewed as tensors. Not all arrays represent linear operators, e.g. they could be just unknowns on a grid.

It's not clear to me what we gain by deprecating vec*(one-row matrix). It is clear to me that by doing so we will confuse a lot of people who will not be thrilled by abstract algebraic arguments.

andyferris commented 7 years ago

Not all arrays represent linear operators, e.g. they could be just unknowns on a grid.

I interpretted the vectorization roadmap (knowing full well that this is your roadmap) as suggesting we don't automatically vectorize functions over "unknowns on a grid". In particular, that the user should indicate that specifically with a dot-call. (edited)

(by the way, I'm a huge fan of dot-call, both for the fusing and the semantic clarity it brings).

stevengj commented 7 years ago

@andyferris, "unknowns on a grid" can still form a vector space. e.g. the solutions to a discretized PDE are normally thought of as being in a Hilbert space. Just because a vector space has + and conj does not mean it has * (except by scalars).

andyferris commented 7 years ago

Sorry, I definitely misused the word "tensor" here - 100% agreed. It's good that all arrays are treated like a vector space for + and conj, but we definitely should not have to think of every array as a (multi) linear map.

andyferris commented 7 years ago

Just letting everyone know that there are now PRs for both options (restoring this functionality at #20423, or else turning the breaking change into a nice deprecation warning message at #20429).

StefanKarpinski commented 7 years ago

I'm a bit torn here but I have to say on the whole I'm with the scruffies (i.e. allow it):

The RowVector change wasn't really about type correctness, it was about making a set of linear algebraic operations/notations work the way that people expect them to work. The APL slicing change was about making trailing dimensions less special, on the other hand. But that seems quite unrelated to this change...
What else could this operation possibly produce besides an error? We certainly do refuse to perform operations where the reasonable result is ambiguous or just unclear, but that's not (afaict) the case here – the only possibilities are a matrix or an error.
What kind of errors do we hope to catch here? Situations where someone writes code that happens to work because some dimension is 1, but then breaks in practice when that dimension isn't 1 anymore? We definitely try to avoid corner cases of the other variety: code works for most values but for some particular gotcha value, it suddenly fails or does something else entirely. This is a big improvement over Matlab (and R, etc.). This is the opposite way: if you don't hit the one special value, there will be an error – that's the kind of mistake we tend to leave to testing to catch, reasonably enough. For example, broadcasting will "happen" to work if a dimension is 1 but otherwise but fails if there's any other size mismatch. Nobody seems troubled by this.
Consistency. We have many operations where trailing singleton dimensions are treated as if they were 1. We allow ones(5,1) + ones(5) as @dlfivefifty pointed out. There's a whole slew of other behaviors in the language like this: size(A, d) returns 1 for d ≥ ndims(A), etc. Conceptually, lower dimensional arrays are "embedded" into the space of higher dimensional arrays. The primary and most important case of this is that vectors are very much identified with columns of matrices. Even though they are not the same type, as much as possible, they behave similarly.

There are certainly correctness counterarguments to be made here, I think that if we're going to make that change it should be made far more comprehensively than disallowing this one operation. Taking one step in that direction from where we are now – which is generally considering lower dimensional arrays to be identified behaviorally with higher dimensional arrays with trailing singleton dimensions – is a step into inconsistency. An argument can be made for going all the way in the other direction, but I suspect that if we make all those behavioral changes, not only will we break a lot of code that works just fine right now, but we may risk making the language pretty unpleasant for linear algebra and related tasks.

MikaelSlevinsky commented 7 years ago

While ones(5,1) + ones(5), size(A, d) = 1 for d ≥ ndims(A) and the v0.5 version of ones(5)*ones(1,5) are examples where a more liberal syntax is and has been permitted, it begs the question on why d is a type parameter of AbstractArray.

If the vector ones(5) is truly one-dimensional, then it has no business multiplying a two-dimensional matrix such as ones(1,5). But if d was not a parameter, then an instance of the concrete type Vector{T} <: AbstractArray{T} could just conform lazily (but mathematically correctly!) to the size of adjacent arrays in multiplication, for instance, and size(v::Vector, d) = 1 for d ≥ 1 seems perfectly legitimate.

StefanKarpinski commented 7 years ago

The code you need to execute depends rather heavily on the value of n in Array{T,n}. The villagers might come for us with pitchforks if we take away the ability to easily write different methods for vectors and matrices, etc.

MikaelSlevinsky commented 7 years ago

I'm not honestly suggesting removing the type parameter, merely pointing out that it's kind of part of the inconsistency between a liberal/lazy approach and a more strict approach: is a Vector in R^{n} or in R^{n x 1 x 1 ....}.

StefanKarpinski commented 7 years ago

That's why I described the identification as conceptual: we can't actually identify vectors with column matrices, but we can make them behave as similarly, which is effectively behavioral identification – that's as good as we can do in computers. Your argument seems to be that because they are not actually the same, they shouldn't behave the same, which doesn't make sense to me.

andyferris commented 7 years ago

If we want to chalk this up as the same as trailing singleton dimensions, then that seems somewhat logical. Although I think code would be more readable if people didn't use this method, and it could potentially hide bugs, Stefan makes good points that in this case hiding bugs isn't actually that likely in this case, and it is somewhat consistent with some other parts of Julia.

(OTOH I have seen code in production with bugs that occur only when some dimension is 1 (which popped up in production in rare cases but wasn't tested for) or only when some dimension isn't 1 but which was also a rare case and untested for. MATLAB was particularly bad for this, which is why I favor moving away from that direction.)

To be clear, I'd rather use the type parameter n more strictly, and in the future disallow ones(5) + ones(5,1), trailing singleton indexing, generalized linear indexing, and so-on. I do make a conceptual distinction between vectors and matrices and I think this is just a generalization of the distinction between scalars and 1-element arrays. Currently, in some cases we are saying "a vector is conceptually a kind of row matrix" and in other cases we are saying these are disjoint. If both the interface and the semantics of arrays consistently took into account trailing singleton indexing, then it sounds like we need a type tree which looks like Array{T,n} <: Array{T, n+1}! 😛 On the other hand, given linear indexing and the fact that we treat higher dimensional array as members of a vector space (i.e. array + array and scalar*array), on an interface level we almost have this perverse thing that AbstactArray{T,n} <: AbstractVector{T} (please don't take these type tree suggestions seriously at all - it's just an observation, and as Stefan says there's a difference between conceptual vs implementation distinctions).

While one may make an argument saying that removing trailing singleton indices from Base (as well as convenient methods like the one discussed here) would make code harder for users to write code, I personally don't write code that uses these features, and I don't feel it requires overhead on my part and I feel the result has improved readability and isn't susceptible to a certain class of bugs. (In fact it's been the case that my types/code have been caught out by the fact that methods in Base such as show and broadcast require your type to support trailing singleton indexing, which surprised me somewhat).

Finally, there is a new semantic thing happening if we support matrix * rowvector for Nx1-sized matrix as the outer product. Here it is not an implied trailing singleton dimension, but the implied (or is it explicit?) first singleton dimension. I don't think this is a problem though - this fact that the first dimension is singleton already a property of the type, just thought this kind of singleton dimension needs to be on our radar since it is new.

dlfivefifty commented 7 years ago

One way to think of it is that Vector satisfies the Matrix interface: e.g., it implements getindex(::Vector,k::Int,j::Int)

JuliaLang / julia

0.6: vec*mat throws "Cannot left-multiply a matrix by a vector" even when mat is 1 x n #20389