Closed freefrancisco closed 10 years ago
Because 1
is a number 1/x
gives the reciprocal of each element of the matrix whereas inv
is the matrix inverse.
It is better to ask a question like this to the julia-users list.
I wonder if that behavior is still a good idea. After all a^-1
is also matrix inverse.
1/A computing the elementwise reciprocal of A is inconsistent with A/B computing B\A. Why not make 1./A compute the elementwise reciprocal instead?
Also there is at least one julia tutorial that expects 1/A == A^-1 http://www.statalgo.com/2012/04/15/statistics-with-julia-linear-algebra-with-lapack/
It feels like 1.0 ./A should be the matrix reciprocal, eye()/A should be the matrix inverse and 1.0/A should just be an error.
That tutorial is quite outdated.
Agree. I was too fast here. There is also the consequence
julia> [1 1]\1
1x2 Array{Float64,2}:
1.0 1.0
julia> [1 1]\[1]
2-element Array{Float64,1}:
0.5
0.5
Most linear algebra function accept Number
interpreted as a Vector
of length one or a 1x1 Matrix
so I guess the right thing for \(Matrix,Number)
is to check if the dimensions are correct.
If we were to go down the road of making 1/A
compute a matrix inverse, then we'd also want to change the behavior of a lot of other operations. For example, A + 1
should be equivalent to A + eye(size(A))
while A .+ 1
would be how you write the broadcasting sense. I've strongly considered this kind of change before, but it would be pretty disruptive. I think the only path to get there would be to deprecate the non-broadcasting forms for a fairly long time and then reintroduce the new "algebraically correct" behavior.
I would seriously consider doing that. The case of +
is a bit odd since I don't see that very often, but /
and \
are used with matrices routinely and the 1/A
change seems like the right thing.
A + 1
(or in more conventional notation, A + λ*I
) is actually a very important use case: A + λ*I
is used in level shifts for all sorts of matrix operations, particularly ill-conditioned matrix eigenvalue problems; (A + λI)⁻¹
is a resolvent kernel for inverting linear operators in the spectral domain; sqrt(A² + λ²I)⁻¹
and the like show up in regularization, to name just three use cases. Currently there isn't a more compact way to write A + λ*I
other than A + λ*eye(A)
, and this notation is wasteful in that it initializes a second matrix only to do little other than to scale and add it to something else. So there are nice algebraic reasons why A + 1
should promote to A + eye(A)
.
I'm not convinced, however, that this has to be conflated with 1/A
computing the matrix inverse vs. the elementwise reciprocal.
A few thoughts on the specific issue of matrix inverse notation though:
A⁻¹
explicitly. There's really no need to make it easier to compute matrix inverses.A/B
as a synonym for B/A
(or at least, try to.). So if change 1/A
to be the matrix inverse, then A/1
ought to be consistent with that as well.1
should have the 1
promote to eye(N)
(the proposed behavior) or ones(N,N)
with elementwise operations (essentially the current behavior). I don't think scalar*matrix could do anything but scale the matrix, no matter how we interpret all the above arguments.
Hm, maybe I wasn't being clear. My point is that there are plenty of contexts in which A + λ
for a matrix A
and scalar λ
is a powerful synonym for A + λ*eye(A)
.
as opposed to the current behavior of A + λ
, which behaves as a synonym for A + λ*ones(A)
(and in other such expressions involving matrices and scalars).
I would be very much in favor of that change. I can't imagine a single mathematician who would not prefer for A + λ
to be an efficient and expressive way to modify the diagonal. The reason that and the 1/A
thing should be coupled is that they are both about how we treat mathematical operations on a scalar and a matrix – should it broadcast or should it do the algebraically correct thing? What we're doing now is mathematically sketchy but at least it's consistent. If we changed one and not the other then we'd be neither consistent nor correct. I think that changing both is the best thing – then we'll be both consistent and correct.
This proposal, by the way, is very much as its heart about whether a Matrix
is a linear algebraic object or a two-dimensional container (c.f. #987). The broadcasting behavior is consistent across all n-dimensional arrays, whereas this proposal would special-case behavior for n=2
.
I like the idea of making scalars behave like they're multiplied by I
of the "correct size" when interacting with matrices through non-broadcasting operators. The fact that there's already operators designated to do element-wise broadcasting makes it a bit easier, too. Addition is well-defended above, and multiplication is free. Division is tricky, though: what's the correct size for I
when dividing by non-square matrices?
If A
is square, then eye(size(A)...)/A * A = I
, no? My linear algebra is rusty, but isn't eye(size(A,2),size(A,1))/A
the choice to make in general? I think that should behave like pinv(A)
.
I know I have tons of A + λ * eye(size(A))
littered through my Matlab code for state estimation and identification. I think this could be a great change.
As an aside, the size(A)
isn't necessary for ones
or eye
:
julia> ones(A) == ones(size(A))
true
julia> eye(A) == eye(size(A))
ERROR: no method eye((Int64,Int64))
julia> eye(A) == eye(size(A)...)
true
@mbauman Tempting as it is to define 1/A
as pinv(A)
for rectangular matrices, pseudoinverses are left inverses but not right inverses (whereas a true inverse is both a left inverse and a right inverse). So I don't think we can reason consistently about 1/A
for nonsquare matrices in a way that neglects order of operations.
I know I have tons of A + λ * eye(size(A)) littered through my Matlab code for state estimation
You could always define a custom +(A::Matrix, λ::Number)
method in your code...
Right, of course, since there is no true inverse for rectangular matrices, there is no universal solution here… perhaps the answer is to punt and throw an error for rectangular matrices? I still like the consistency of (effectively) transforming the scalar λ
into λI
across all basic arithmetic operations with matrices.
(Thanks for the tips, Kevin and Jiahao - I'll be happy to ditch those Matlabisms in my Julia code)
@kmsquire thanks, I've edited my comments.
@mbauman we wouldn't have to do anything new at the level of redefining 1/A
, since if it calls inv(A)
, that would do the dimension check already.
Also, speaking of customization, probably a special matrix type which just keeps a scalar and represents a matrix proportional to the identity of a variable size, together with a constant I
associated to the value 1, wouldn't be too difficult to write. For example:
immutable IdentityMatrix{T<:Number}
λ::T
end
const I = IdentityMatrix(1)
Base.(:*)(x::Number, M::IdentityMatrix) = IdentityMatrix(x*M.λ)
Base.(:+){T<:Number}(A::Matrix{T}, M::IdentityMatrix) = A + M.λ * eye(A)
Base.(:/){T<:Number}(M::IdentityMatrix, A::Matrix{T}) = M.λ * inv(A)
etc. (of course +
should be written more efficiently).
With these definitions alone one already has:
julia> A = rand(2,2)
2x2 Array{Float64,2}:
0.184807 0.557457
0.342099 0.889467
julia> A + 2I
2x2 Array{Float64,2}:
2.18481 0.557457
0.342099 2.88947
julia> I/A
2x2 Array{Float64,2}:
-33.787 21.1754
12.9948 -7.02
Yes, that solution had occurred to me as well. It's elegant.
@carlobaldassi the IdentityMatrix
construct is actually a very strange object, since with a single instantiation I
you could write things like (A+I)*(B+I)*(C+I)
regardless of the dimensions of A
, B
, C
, so long as they are such that A*B*C
is sensible. It's not immediately obvious that it is wrong to define such a thing, but it is certainly not very standard linear algebra to have a single operator defined over multiple vector spaces.
It is quite common as notation though ;-)
don't be trolling now.
Not to careen too far offpath here, but I've been thinking for awhile that IterativeSolvers.jl really needs a ShiftedMatrix
type that wraps both the matrix A
and the shift λ
and serves as a functional synonym for A+λI
, so that one need never actually compute A+λI
explicitly. Instead, one is usually interested in expressions like (A+λI)\b
, which can be worked out much more efficiently than an explicit construction which may break desirable symmetries or sparsity patterns in A
.
Also, it may make sense for A + I
to be an error if A
is non-square. In that case all of those I
are conceptually of the same square shape. Of course, one can concoct other expressions where the I
are not of the same size.
The same comment applies to A + 1
and 1/A
– if these were to do the algebraic thing, it might be best for them to require A
to be square.
1/A
should only be defined where inv(A)
is, i.e. square and nonsingular.
A+1
can be defined for rectangular A
trivially by zero-padding the identity matrix with additional columns or rows of zeros as necessary.
It's not written explicitly, but in my sketch above I was actually thinking of only allowing I
to represent a square matrix (inv
does assume that of course). To me, the A + λI
notation seems much more explicit than having a Number
magically promote to a Matrix
(BTW I forgot to declare the type as IdentityMatrix<:AbstractMatrix
); if the latter was the choice though, I'd still support the restriction to square matrices only.
I have to say that I strongly oppose the notation A + 2
as a synonym for A + 2*eye(A)
. For one thing, the only case where this makes any sense at all is for ndims(A) == 2
, so it would mean that A + 2
means something completely different depending on the rank of A
. For another thing, when you are just manipulating containers of data, adding a scalar to everything in the container is quite a common operation, and this would leave you with no nice, efficient notation for this operation (since A + 2*ones(A)
is grossly inefficient).
For 1/A
, my feelings aren't as strong, but I really don't see a strong motivation to allow this notation for matrix inverses. My tendency would be to just throw an error in this case, and require the user to either do 1 ./ A
if they want the broadcasting operation or inv(A)
if they want the inverse. Trying to guess the user's intention for 1/A
seems like a recipe for bugs.
@stevengj you can always do A .+ 2
.
Currently, we have that ones(4,4) + 2 == ones(4,4) .+ 2
. I'm in favor of retaining a set of operations that broadcast scalar updates. However, I do agree that the special-casing of two-dimensional Array
s is the much more pertinent issue. (I have a comment somewhere above to this effect.)
We could just make adding a scalar mean to add along the diagonal in N dimensions. I'm just not sure if that's a particularly meaningful operation.
@StefanKarpinski, I don't think that's a commonly useful operation, and it would be an invitation for bugs. If you go the route of A + 2 == A + 2eye(A)
, then +(::Array, ::Number)
should probably not even be defined except for 2d arrays; everything else should be forced to use .+
. (If you're going to break a lot of code, you should break it loudly.) But I have to say I would not be thrilled about this change.
We should think much more carefully about what is meaningful behavior for N
-dimensional tensors before proceeding with any breaking changes of this magnitude. (c.f. #987, #4774; is it sad that I've memorized these issue numbers?)
Small additional comment: if we allowed non-square matrices, we would have things like:
julia> a = [0 0 0]
1x3 Array{Int64,2}:
0 0 0
julia> a + 1
1x3 Array{Int64,2}:
1 0 0
which surely many people would find surprising (I among those).
Yeah, that's pretty sketchy.
[0 0 0]
is not a row vector; it is a 1x3 Matrix
, of rank 1, which mapping of ℝ³ onto ℝ¹, and for which the identity operator (from ℝ³ onto ℝ¹,) has the representation [1 0 0]
. Not necessarily obvious, but entirely justifiable from an operator viewpoint.
(On a less facetious note, this is why #4774 is so thorny.)
Are the changes to /
and +
really so tightly coupled? Having 1/A != A^-1
seems like a more serious violation to me.
https://github.com/JuliaStats/TimeSeries.jl/issues/80
I think @jiahao previously mentioned this but some mathematical operators treat for example an Int
as int(myvalue(ones(my length)))
but others don't. AFAICT, it applies to +
, -
, *
, /
but not to ^
I guess "+, -, *, / are elementwise when one argument is a scalar" is a reasonable and easy-to-remember rule.
Let me elaborate a bit. Given foo = [1,2]
, it is equivalent to write:
julia> foo + 2
2-element Array{Int64,1}:
3
4
julia> foo .+ 2
2-element Array{Int64,1}:
3
4
But not
julia> foo .^ 2
2-element Array{Int64,1}:
1
4
julia> foo ^ 2
ERROR: no method *(Array{Int64,1}, Array{Int64,1})
in power_by_squaring at intfuncs.jl:90
in ^ at intfuncs.jl:113
Treating the major four operators (+
, -
, *
, /
) as special cases where element-wise operations are equivalent to array operations is still is a bit confusing.
julia> foo = [1,2]
2-element Array{Int64,1}:
1
2
julia> sum(foo, foo)
1-element Array{Int64,1}:
3
julia> foo + foo
2-element Array{Int64,1}:
2
4
julia> foo .+ foo
2-element Array{Int64,1}:
2
4
To me it would make sense that for any mathematical object where a multiplicative identity (left and right) exists that A op a
should mean A op aI
where A
is the mathematical object in question, a
is a scalar, and op
is an operator. When the identity doesn't exist this operation should be undefined instead of repeating the functionality of the .op
dotted operators.
The element-wise operations are well defined already, and the dotted operators are already defined unambiguously to do this. Why repeat the functionality with the non-dotted operators and waste the opportunity to use them in a different way? In a way that corresponds to what they actually mean in math notation? It would also make for clearer code when people want to broadcast to actually use .op
to show unambiguously that it was their intention.
A + 2 means A + 2 * ones(size(A)) is a convention that has been well established in technical computing languages and numeric libraries.
Changing this behavior is going to be a huge surprise to everyone, no matter which background he/she comes from. It also probably breaks a huge part of existing codes.
The existence of .+ is not a strong enough reason to justify such a change.
I am very strongly against the idea of using A + 2 to express A + 2 I.
The A + 2I
really sounds a good idea to me, as it's quite explicit while concise and essentially what you write in math formulas. I don't think any mathematician relies on the convention that A + 2
means A + 2I
.
I was trying to figure out how to invert a matrix in a 0.2.0 installation of julia, and I found out that 1/x gives me a different inverse than inv(x).
Here is my output