JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.71k stars 5.48k forks source link

1/x not the same as inv(x) for matrix in julia 0.2.0 #5807

Closed freefrancisco closed 10 years ago

freefrancisco commented 10 years ago

I was trying to figure out how to invert a matrix in a 0.2.0 installation of julia, and I found out that 1/x gives me a different inverse than inv(x).
Here is my output

julia> x = rand(3, 3)
3x3 Array{Float64,2}:
 0.698453  0.500252  0.822943
 0.483441  0.078515  0.889188
 0.275469  0.341758  0.195685

julia> 1/x
3x3 Array{Float64,2}:
 1.43174   1.99899  1.21515
 2.06851  12.7364   1.12462
 3.63017   2.92605  5.11026

julia> inv(x)
Warning: Possible conflict in library symbol dtrtri_
Warning: Possible conflict in library symbol dgetri_
Warning: Possible conflict in library symbol dgetrf_
3x3 Array{Float64,2}:
  35.4312  -22.5164  -46.69
 -18.4623   11.0545   27.4109
 -17.6333   12.3904   22.9644

julia> versioninfo()
Julia Version 0.2.0
Commit 05c6461 (2013-11-16 23:44 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin12.5.0)
  WORD_SIZE: 64
  BLAS: libgfortblas
  LAPACK: liblapack
  LIBM: libopenlibm
andreasnoack commented 10 years ago

Because 1 is a number 1/x gives the reciprocal of each element of the matrix whereas inv is the matrix inverse.

It is better to ask a question like this to the julia-users list.

pao commented 10 years ago

You can find the julia-users list here.

JeffBezanson commented 10 years ago

I wonder if that behavior is still a good idea. After all a^-1 is also matrix inverse.

jiahao commented 10 years ago

1/A computing the elementwise reciprocal of A is inconsistent with A/B computing B\A. Why not make 1./A compute the elementwise reciprocal instead?

freefrancisco commented 10 years ago

Also there is at least one julia tutorial that expects 1/A == A^-1 http://www.statalgo.com/2012/04/15/statistics-with-julia-linear-algebra-with-lapack/

jiahao commented 10 years ago

It feels like 1.0 ./A should be the matrix reciprocal, eye()/A should be the matrix inverse and 1.0/A should just be an error.

jiahao commented 10 years ago

That tutorial is quite outdated.

andreasnoack commented 10 years ago

Agree. I was too fast here. There is also the consequence

julia> [1 1]\1
1x2 Array{Float64,2}:
 1.0  1.0

julia> [1 1]\[1]
2-element Array{Float64,1}:
 0.5
 0.5

Most linear algebra function accept Number interpreted as a Vector of length one or a 1x1 Matrix so I guess the right thing for \(Matrix,Number) is to check if the dimensions are correct.

StefanKarpinski commented 10 years ago

If we were to go down the road of making 1/A compute a matrix inverse, then we'd also want to change the behavior of a lot of other operations. For example, A + 1 should be equivalent to A + eye(size(A)) while A .+ 1 would be how you write the broadcasting sense. I've strongly considered this kind of change before, but it would be pretty disruptive. I think the only path to get there would be to deprecate the non-broadcasting forms for a fairly long time and then reintroduce the new "algebraically correct" behavior.

JeffBezanson commented 10 years ago

I would seriously consider doing that. The case of + is a bit odd since I don't see that very often, but / and \ are used with matrices routinely and the 1/A change seems like the right thing.

jiahao commented 10 years ago

A + 1 (or in more conventional notation, A + λ*I) is actually a very important use case: A + λ*I is used in level shifts for all sorts of matrix operations, particularly ill-conditioned matrix eigenvalue problems; (A + λI)⁻¹ is a resolvent kernel for inverting linear operators in the spectral domain; sqrt(A² + λ²I)⁻¹ and the like show up in regularization, to name just three use cases. Currently there isn't a more compact way to write A + λ*I other than A + λ*eye(A), and this notation is wasteful in that it initializes a second matrix only to do little other than to scale and add it to something else. So there are nice algebraic reasons why A + 1 should promote to A + eye(A).

I'm not convinced, however, that this has to be conflated with 1/A computing the matrix inverse vs. the elementwise reciprocal.

jiahao commented 10 years ago

A few thoughts on the specific issue of matrix inverse notation though:

  1. The usual admonition is that people should not compute the matrix inverse A⁻¹ explicitly. There's really no need to make it easier to compute matrix inverses.
  2. However, we do use A/B as a synonym for B/A (or at least, try to.). So if change 1/A to be the matrix inverse, then A/1 ought to be consistent with that as well.
  3. The real question is whether an expression containing both a matrix and 1 should have the 1 promote to eye(N) (the proposed behavior) or ones(N,N) with elementwise operations (essentially the current behavior).
JeffBezanson commented 10 years ago

I don't think scalar*matrix could do anything but scale the matrix, no matter how we interpret all the above arguments.

jiahao commented 10 years ago

Hm, maybe I wasn't being clear. My point is that there are plenty of contexts in which A + λ for a matrix A and scalar λ is a powerful synonym for A + λ*eye(A).

jiahao commented 10 years ago

as opposed to the current behavior of A + λ, which behaves as a synonym for A + λ*ones(A) (and in other such expressions involving matrices and scalars).

StefanKarpinski commented 10 years ago

I would be very much in favor of that change. I can't imagine a single mathematician who would not prefer for A + λ to be an efficient and expressive way to modify the diagonal. The reason that and the 1/A thing should be coupled is that they are both about how we treat mathematical operations on a scalar and a matrix – should it broadcast or should it do the algebraically correct thing? What we're doing now is mathematically sketchy but at least it's consistent. If we changed one and not the other then we'd be neither consistent nor correct. I think that changing both is the best thing – then we'll be both consistent and correct.

jiahao commented 10 years ago

This proposal, by the way, is very much as its heart about whether a Matrix is a linear algebraic object or a two-dimensional container (c.f. #987). The broadcasting behavior is consistent across all n-dimensional arrays, whereas this proposal would special-case behavior for n=2.

mbauman commented 10 years ago

I like the idea of making scalars behave like they're multiplied by I of the "correct size" when interacting with matrices through non-broadcasting operators. The fact that there's already operators designated to do element-wise broadcasting makes it a bit easier, too. Addition is well-defended above, and multiplication is free. Division is tricky, though: what's the correct size for I when dividing by non-square matrices?

If A is square, then eye(size(A)...)/A * A = I, no? My linear algebra is rusty, but isn't eye(size(A,2),size(A,1))/A the choice to make in general? I think that should behave like pinv(A).

I know I have tons of A + λ * eye(size(A)) littered through my Matlab code for state estimation and identification. I think this could be a great change.

kmsquire commented 10 years ago

As an aside, the size(A) isn't necessary for ones or eye:

julia> ones(A) == ones(size(A))
true

julia> eye(A) == eye(size(A))
ERROR: no method eye((Int64,Int64))

julia> eye(A) == eye(size(A)...)
true
jiahao commented 10 years ago

@mbauman Tempting as it is to define 1/A as pinv(A) for rectangular matrices, pseudoinverses are left inverses but not right inverses (whereas a true inverse is both a left inverse and a right inverse). So I don't think we can reason consistently about 1/A for nonsquare matrices in a way that neglects order of operations.

jiahao commented 10 years ago

I know I have tons of A + λ * eye(size(A)) littered through my Matlab code for state estimation

You could always define a custom +(A::Matrix, λ::Number) method in your code...

mbauman commented 10 years ago

Right, of course, since there is no true inverse for rectangular matrices, there is no universal solution here… perhaps the answer is to punt and throw an error for rectangular matrices? I still like the consistency of (effectively) transforming the scalar λ into λI across all basic arithmetic operations with matrices.

(Thanks for the tips, Kevin and Jiahao - I'll be happy to ditch those Matlabisms in my Julia code)

jiahao commented 10 years ago

@kmsquire thanks, I've edited my comments.

@mbauman we wouldn't have to do anything new at the level of redefining 1/A, since if it calls inv(A), that would do the dimension check already.

carlobaldassi commented 10 years ago

Also, speaking of customization, probably a special matrix type which just keeps a scalar and represents a matrix proportional to the identity of a variable size, together with a constant I associated to the value 1, wouldn't be too difficult to write. For example:

immutable IdentityMatrix{T<:Number}
    λ::T
end

const I = IdentityMatrix(1)

Base.(:*)(x::Number, M::IdentityMatrix) = IdentityMatrix(x*M.λ)
Base.(:+){T<:Number}(A::Matrix{T}, M::IdentityMatrix) = A + M.λ * eye(A)
Base.(:/){T<:Number}(M::IdentityMatrix, A::Matrix{T}) = M.λ * inv(A)

etc. (of course + should be written more efficiently). With these definitions alone one already has:

julia> A = rand(2,2)
2x2 Array{Float64,2}:
 0.184807  0.557457
 0.342099  0.889467

julia> A + 2I
2x2 Array{Float64,2}:
 2.18481   0.557457
 0.342099  2.88947 

julia> I/A
2x2 Array{Float64,2}:
 -33.787   21.1754
  12.9948  -7.02  
StefanKarpinski commented 10 years ago

Yes, that solution had occurred to me as well. It's elegant.

jiahao commented 10 years ago

@carlobaldassi the IdentityMatrix construct is actually a very strange object, since with a single instantiation I you could write things like (A+I)*(B+I)*(C+I) regardless of the dimensions of A, B, C, so long as they are such that A*B*C is sensible. It's not immediately obvious that it is wrong to define such a thing, but it is certainly not very standard linear algebra to have a single operator defined over multiple vector spaces.

StefanKarpinski commented 10 years ago

It is quite common as notation though ;-)

jiahao commented 10 years ago

don't be trolling now.

jiahao commented 10 years ago

Not to careen too far offpath here, but I've been thinking for awhile that IterativeSolvers.jl really needs a ShiftedMatrix type that wraps both the matrix A and the shift λ and serves as a functional synonym for A+λI, so that one need never actually compute A+λI explicitly. Instead, one is usually interested in expressions like (A+λI)\b, which can be worked out much more efficiently than an explicit construction which may break desirable symmetries or sparsity patterns in A.

StefanKarpinski commented 10 years ago

Also, it may make sense for A + I to be an error if A is non-square. In that case all of those I are conceptually of the same square shape. Of course, one can concoct other expressions where the I are not of the same size.

StefanKarpinski commented 10 years ago

The same comment applies to A + 1 and 1/A – if these were to do the algebraic thing, it might be best for them to require A to be square.

jiahao commented 10 years ago

1/A should only be defined where inv(A) is, i.e. square and nonsingular.

A+1 can be defined for rectangular A trivially by zero-padding the identity matrix with additional columns or rows of zeros as necessary.

carlobaldassi commented 10 years ago

It's not written explicitly, but in my sketch above I was actually thinking of only allowing I to represent a square matrix (inv does assume that of course). To me, the A + λI notation seems much more explicit than having a Number magically promote to a Matrix (BTW I forgot to declare the type as IdentityMatrix<:AbstractMatrix); if the latter was the choice though, I'd still support the restriction to square matrices only.

stevengj commented 10 years ago

I have to say that I strongly oppose the notation A + 2 as a synonym for A + 2*eye(A). For one thing, the only case where this makes any sense at all is for ndims(A) == 2, so it would mean that A + 2 means something completely different depending on the rank of A. For another thing, when you are just manipulating containers of data, adding a scalar to everything in the container is quite a common operation, and this would leave you with no nice, efficient notation for this operation (since A + 2*ones(A) is grossly inefficient).

For 1/A, my feelings aren't as strong, but I really don't see a strong motivation to allow this notation for matrix inverses. My tendency would be to just throw an error in this case, and require the user to either do 1 ./ A if they want the broadcasting operation or inv(A) if they want the inverse. Trying to guess the user's intention for 1/A seems like a recipe for bugs.

carlobaldassi commented 10 years ago

@stevengj you can always do A .+ 2.

jiahao commented 10 years ago

Currently, we have that ones(4,4) + 2 == ones(4,4) .+ 2. I'm in favor of retaining a set of operations that broadcast scalar updates. However, I do agree that the special-casing of two-dimensional Arrays is the much more pertinent issue. (I have a comment somewhere above to this effect.)

StefanKarpinski commented 10 years ago

We could just make adding a scalar mean to add along the diagonal in N dimensions. I'm just not sure if that's a particularly meaningful operation.

stevengj commented 10 years ago

@StefanKarpinski, I don't think that's a commonly useful operation, and it would be an invitation for bugs. If you go the route of A + 2 == A + 2eye(A), then +(::Array, ::Number) should probably not even be defined except for 2d arrays; everything else should be forced to use .+. (If you're going to break a lot of code, you should break it loudly.) But I have to say I would not be thrilled about this change.

jiahao commented 10 years ago

We should think much more carefully about what is meaningful behavior for N-dimensional tensors before proceeding with any breaking changes of this magnitude. (c.f. #987, #4774; is it sad that I've memorized these issue numbers?)

carlobaldassi commented 10 years ago

Small additional comment: if we allowed non-square matrices, we would have things like:

julia> a = [0 0 0]
1x3 Array{Int64,2}:
 0  0  0

julia> a + 1
1x3 Array{Int64,2}:
 1  0  0

which surely many people would find surprising (I among those).

StefanKarpinski commented 10 years ago

Yeah, that's pretty sketchy.

jiahao commented 10 years ago

[0 0 0] is not a row vector; it is a 1x3 Matrix, of rank 1, which mapping of ℝ³ onto ℝ¹, and for which the identity operator (from ℝ³ onto ℝ¹,) has the representation [1 0 0]. Not necessarily obvious, but entirely justifiable from an operator viewpoint.

(On a less facetious note, this is why #4774 is so thorny.)

JeffBezanson commented 10 years ago

Are the changes to / and + really so tightly coupled? Having 1/A != A^-1 seems like a more serious violation to me.

milktrader commented 10 years ago

https://github.com/JuliaStats/TimeSeries.jl/issues/80

I think @jiahao previously mentioned this but some mathematical operators treat for example an Int as int(myvalue(ones(my length))) but others don't. AFAICT, it applies to +, -, *, / but not to ^

JeffBezanson commented 10 years ago

I guess "+, -, *, / are elementwise when one argument is a scalar" is a reasonable and easy-to-remember rule.

milktrader commented 10 years ago

Let me elaborate a bit. Given foo = [1,2], it is equivalent to write:

julia> foo + 2
2-element Array{Int64,1}:
 3
 4

julia> foo .+ 2
2-element Array{Int64,1}:
 3
 4

But not

julia> foo .^ 2
2-element Array{Int64,1}:
 1
 4

julia> foo ^ 2
ERROR: no method *(Array{Int64,1}, Array{Int64,1})
 in power_by_squaring at intfuncs.jl:90
 in ^ at intfuncs.jl:113
milktrader commented 10 years ago

Treating the major four operators (+, -, *, /) as special cases where element-wise operations are equivalent to array operations is still is a bit confusing.

julia> foo = [1,2]
2-element Array{Int64,1}:
 1
 2

julia> sum(foo, foo)
1-element Array{Int64,1}:
 3

julia> foo + foo
2-element Array{Int64,1}:
 2
 4

julia> foo .+ foo
2-element Array{Int64,1}:
 2
 4
freefrancisco commented 10 years ago

To me it would make sense that for any mathematical object where a multiplicative identity (left and right) exists that A op a should mean A op aI where A is the mathematical object in question, a is a scalar, and op is an operator. When the identity doesn't exist this operation should be undefined instead of repeating the functionality of the .op dotted operators.

The element-wise operations are well defined already, and the dotted operators are already defined unambiguously to do this. Why repeat the functionality with the non-dotted operators and waste the opportunity to use them in a different way? In a way that corresponds to what they actually mean in math notation? It would also make for clearer code when people want to broadcast to actually use .op to show unambiguously that it was their intention.

lindahua commented 10 years ago

A + 2 means A + 2 * ones(size(A)) is a convention that has been well established in technical computing languages and numeric libraries.

Changing this behavior is going to be a huge surprise to everyone, no matter which background he/she comes from. It also probably breaks a huge part of existing codes.

The existence of .+ is not a strong enough reason to justify such a change.

I am very strongly against the idea of using A + 2 to express A + 2 I.

nalimilan commented 10 years ago

The A + 2I really sounds a good idea to me, as it's quite explicit while concise and essentially what you write in math formulas. I don't think any mathematician relies on the convention that A + 2 means A + 2I.