JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.71k stars 5.49k forks source link

make I(n) behave as `Diagonal` / `eye`? #23156

Closed StefanKarpinski closed 5 years ago

StefanKarpinski commented 7 years ago

It might be nice if we made I callable so that you can write I(10) and get a 10x10 Diagonal matrix object. Of course it's a bit weird to call one kind of object and get another, but we do that with functions all the time so maybe that's no problem. Possible definition:

(σ::UniformScaling)(n::Integer) = Diagonal(collect(Iterators.repeated(σ.λ, n)))

With this definition, we could potentially deprecate eye(n) (which is usually quite inefficient) in favor of I(n) when one doesn't need to mutate any off-diagonals or full(I(n)) in the cases where one does need to mutate an off-diagonal.

yuyichao commented 7 years ago

What advantage does this have? Functions are pretty special (sure they are objects but they are very special ones) so I don't think functions being callable is a good arguments for this. It's a good argument for not implementing algebra or other operations on functions if anything. The syntax will also be really confusing with juxtapose multiplication when you have both I(2) and 2I in the code.

eye is also a well known function name from python and Matlab so I don't think removing it is a good idea. We should certainly make more efficient things easier to do (x * I sounds simple enough though) but I don't think we should make doing less efficient but sometimes necessary things much harder. It's also unclear to me what would be the new syntax to get a eye(Int8, 3) or eye(Bool, 2)

JeffBezanson commented 7 years ago

You can do copy!(Matrix{Int8}(n,n), I). eye is a pun that I've never really liked that much.

lobingera commented 7 years ago

As in similar places: What problem is solved by this?

KlausC commented 7 years ago

When I heard eye the first time in the MATLAB context, I typed I. I thought somebody was kidding, when said: <Type "e", "why", "e">. So I am conditioned for 'I'.

StefanKarpinski commented 7 years ago

We currently have eye, I and Diagonal (not to mention speye), it would be nice to have one less function for making a linear algebra identity operator. There's almost never any reason to construct a dense one, so it's actively bad to have eye around since using I would be better in 99% of case. Since one want to use it in situations like A + I and [A I; I B] already, it seems much more natural to use I(n) for constructing a particular size of identity matrix instead of explaining "oh, there's this other thing with a really punny name to do that, but don't actually use it since it's not very efficient, instead do Diagonal(collect(Iterators.repeated(1, n)))". I'm proposing that we provide a more convenient spelling of that last expression – and I is a pretty natural name for it.

StefanKarpinski commented 7 years ago

I kind of think we should double down on the whole I thing and allow it to produce rectangular matrices and make Z a binding for 0I that lets us write things like [A Z; Z B]. Either that or allow a scalar to implicitly broadcast in concatenation expressions so you can write [A 0; 0 B]. Writing [A zeros(size(A,2), size(B,1)); zeros(size(B,1), size(A,2)) B] is such a drag.

tkelman commented 7 years ago

Diagonal(ones(n)) isn't that long. Also a different type than UniformScaling, odd to use callable I to construct something of a different type.

JeffBezanson commented 7 years ago

I'm also fine with Diagonal(ones(n)) and Array(Diagonal(ones(n))) (for a full eye replacement). It's pretty clear, and as Stefan said you don't really want to be constructing big mutable dense identity matrices if you can avoid it.

RalphAS commented 7 years ago

@StefanKarpinski "I kind of think we should double down … and make Z a binding for 0I " Please, please, don't do this. Exporting single-character variables from the base library is rarely wise. (In fact, it's almost always foolish, but I'll save that for another day.) Those of us who actually use linear algebra employ Z for lots of obvious purposes, among which your proposal does not figure.

TotalVerb commented 7 years ago

0I seems like a reasonable way to express 0I, so there's no need to spell zeros in full.

StefanKarpinski commented 7 years ago

This issue as stated is not breaking, so doesn't belong on the milestone. However, it has been brought up previously that we may want to get rid of eye since it is inefficient and never really the best alternative. Deprecating it would force people to consider better alternatives. So perhaps that should be the real question at hand here. Should we deprecate eye and if so, do we have sufficient facilities to cover all use cases of it?

JeffBezanson commented 7 years ago

I would deprecate eye to something like Array(Diagonal(ones(n))) --- is there any way to do that without allocating the vector of ones? I see copy!(zeros(3,3), I) works, but that depends on mutation so isn't ideal.

StefanKarpinski commented 7 years ago

Note that eye definitely belongs to the LinAlg module so could go in stdlib rather than Base.

StefanKarpinski commented 7 years ago

@Sacha0 has agreed to give replacing eye with better things a try and seeing what we need.

Sacha0 commented 7 years ago

@Sacha0 has agreed to give replacing eye with better things a try and seeing what we need.

To start I skimmed through all eye/speye calls in base, and then sketched a pull request deprecating speye (#24356). Observations from that deprecation sketch follow below, most of which also apply to eye.

speye takes the following forms: [sp]eye([T::Type], m::Integer[, n::Integer]) and speye(A::AbstractMatrix).

Where speye(m::Integer[, n::Integer]) cannot be replaced with something more clever (e.g. I alone), sparse(1.0I, m[, n]) works, as would SparseMatrixCSC{Float64}(I, m, n) (when that constructor exists). Fine so far, though having to specify the Float64 in SparseMatrixCSC{Float64} due to eltype(I) defaulting to Int may be slightly cumbersome.

Replacing speye(T, m[, n]) is trickier: Where typeof(one(T) * eltype(I)) == T, sparse(one(T)I, m[, n]) or some equivalent works reasonably well. But where typeof(one(T) * eltye(I)) != T, e.g. in speye(Bool, m), life is a bit tricker: Seemingly one must either write sparse(UniformScaling(one(T)), m[, n]), which is a bit cumbersome due to the UniformScaling(one(T)), or SparseMatrixCSC{T}(I, m, n) (when that constructor exists). A compact way to construct a UniformScaling with specified eltype might smooth that wrinkle out. Similarly, e.g. speye(Real, m) is tricky, as sparse(UniformScaling{Real}(1), m) yields a SparseMatrixCSC{Int} rather than SparseMatrixCSC{Real}. That either seems like a bug in sparse that should be fixed, or calls for a method to handle SparseMatrixCSC{Real}(I, m, n).

speye(A) is similarly tricky: The best rewrite I have so far is sparse(UniformScaling(one(eltype(A))), size(A)...), which is a bit cumbersome. Here again a compact way to construct a UniformScaling with specified eltype might be nice. Supporting array constructors that accept an iterable as their first argument, and another AbstractArray from which eltype and shape are taken as the second argument, might also smooth this bump out (e.g. SparseMatrixCSC(I, A)).

Methods for comparison against identity (foo == I, foo ≈ I) could be useful here and there too. Best!

fredrikekre commented 7 years ago

To start I skimmed through all eye/speye calls in base

I'm curious, how often did you encounter calls like speye([T, ]m, n) where m !== n? Since it does not create an identity matrix it is rather weird that is even supported? Perhaps we can just discontinue that construction? Would make this simpler I imagine.

Sacha0 commented 7 years ago

I'm curious, how often did you encounter calls like speye([T, ]m, n) where m !== n? Since it does not create an identity matrix it is rather weird that is even supported? Perhaps we can just discontinue that construction? Would make this simpler I imagine.

Fantastic question! In the speye deprecation and excluding tests specific to the corresponding methods, I found only one instance where m != n (and in that instance, the matrix's contents could just as well be zero or junk). In other words, at least the speye([T, ]m, n) (and so sparse(I, m, n)) methods are likely superfluous.

Similarly, I found only three uses of speye(A). Those three uses are essentially identical, occurring in definitions of arithmetic between SparseMatrixCSCs and UniformScalings. From the speye deprecation diff:

-(+)(A::SparseMatrixCSC, J::UniformScaling) = A + J.λ * speye(A)
-(-)(A::SparseMatrixCSC, J::UniformScaling) = A - J.λ * speye(A)
-(-)(J::UniformScaling, A::SparseMatrixCSC) = J.λ * speye(A) - A
+(+)(A::SparseMatrixCSC, J::UniformScaling) = A + sparse(J, size(A)...)
+(-)(A::SparseMatrixCSC, J::UniformScaling) = A - sparse(J, size(A)...)
+(-)(J::UniformScaling, A::SparseMatrixCSC) = sparse(J, size(A)...) - A

As shown, those uses are perhaps better written sparse(J, size(A)...). But these methods could be written more efficiently without the sparse(J, size(A)...)/speye(A) altogether. Moreover, the same arithmetic operations between AbstractMatrixs and UniformScalings throw if the AbstractMatrix is not square. So one could argue that the uses in the copied methods are incorrect. Altogether, the speye(A) methods might also be superfluous.

So perhaps deprecating sparse(I, m, n) (as opposed to sparse(I, n)) alongside speye is reasonable, which would simplify life a bit.

I will keep this question in mind while reviewing uses of eye again! Thank you! :)

Sacha0 commented 7 years ago

On second thought, with the future SparseMatrixCSC(I, m, n) around, perhaps sparse(I, m, n) should remain around as well? Thoughts? Best!

Sacha0 commented 7 years ago

Another observation:

About a third of eye invocations in test/ are in spirit either res == eye(...) or res ≈ eye(...). Where the intention of such tests is to check that the contents of res match identity, res == I and res ≈ I are superior rewrites. But such tests also implicitly test res's shape, and whether that shape check was intended / is an important part of the test is rarely (if ever) clear.

So while most of those test could be nicely rewritten as res == I or res ≈ I, strictly they should be (res = ...; res == I && size(res) == ...). On the one hand, where necessary the latter rewrite might sometimes be a bit verbose. On the other hand, the latter rewrite makes what the test actually tests explicit, which is of tremendous benefit to parties reading/modifying the tests. Best!

Sacha0 commented 7 years ago

Additional observations. Most eye calls serve one of the following purposes, ordered roughly by prevalence:

  1. Comparison (==, ) against identity, discussed in the preceding comment.
  2. Concise array construction, where frequently the contents do not much matter.
  3. Construction of a dense identity matrix for subsequent in-place multiplication.
  4. Construction of a dense identity for subsequent out-of-place multiplication (with some implicitly represented operator).

For purposes (3) and (4), Matrix(I, m, n) for eye(m[, n]) seems like a good rewrite. The catch: (3) requires eltype(Matrix(I, m, n)) <: AbstractFloat and (4) usually expects the same, while eltype(Matrix(I, m, n)) would beInt64 presently (or Bool with #24396) due to corresponding eltype(I). Matrix(1.0I, m, n) and Matrix{Float64}(I, m, n) do the trick but are somewhat less pleasant. Consequently, special-casing Matrix(I, m, n) to return a Matrix{Float64} might be worthwhile given that's likely the most useful result. (And likewise with future Array([zeros|ones], dims) methods, matching existing zeros/ones behavior.) Best!

andreasnoack commented 7 years ago

Consequently, special-casing Matrix(I, m, n) to return a Matrix{Float64} might be worthwhile given that's likely the most useful result.

For computations of eigen and singular vectors you'd like to be able to specify the element type. It could be Float32, BigFloat, or Complex{<:AbstractFloat} as well so you'd typically use the generic version anyway. Hence it might not be too important to avoid Int elements by default. I think Matrix{T}(I, n) would be fine in those applications.

Sacha0 commented 7 years ago

Sounds good on this end! :)

ViralBShah commented 7 years ago

Alternate issue title: An I for an eye.

Sacha0 commented 6 years ago

With #24438 and the other pull requests linked above in, this issue should be set :).

andreasnoack commented 6 years ago

I think we should consider having I(n) alongside Matrix(I, n, n). It would be a nice simplification for a common teaching situation where you'd like the identity to have a size and it shouldn't really get in the way of anybody. In practice I doubt that 2I vs I(2) will be a real concern.

alanedelman commented 6 years ago

I agree. (says the teacher of large linear algebra classes)

StefanKarpinski commented 6 years ago

It's a little weird that calling an instance of UniformScaling creates an instance of Matrix, but 🤷‍♂️. It does seem quite convenient and intuitive, so 👍.

StefanKarpinski commented 5 years ago

This keeps coming up—let's do it.

jebej commented 5 years ago

Also useful for tensor products kron(A,I(4)) where the size of the identity isn't determined from A.

tkf commented 5 years ago

Re: implementation, how about adding a lazy non-allocating representation of x * ones(n) (say, UniformVector; something similar to FillArrays.Ones), and define (λ * I)(n) via

(σ::UniformScaling)(n::Integer) = Diagonal(UniformVector(σ.λ, n))

UniformVector is useful in other places like nzval of sparse arrays.

jlapeyre commented 5 years ago

Some comments:

eye may be a bad pun, but it is an excellent mnemonic.

My guess is that having eye taken away without a clearly documented (anywhere documented ?), simple replacement is upsetting to many users. I have seen this anecdotally several times. I don't think they care whether the identity matrix is lazy or not, just that they have something that is not too much more difficult to type and to remember than eye, and don't have to think about it anymore. This may be related to Julia becoming less of an application and more of a language (ie moving away from MATLAB), and the need for a more application-like layer.

Allocation is not always slower than a lazily evaluated object (https://github.com/BBN-Q/QuantumInfo.jl/pull/3):

import FillArrays
import LinearAlgebra
using BenchmarkTools

eye(n::Integer) = Matrix{Float64}(LinearAlgebra.I, (n, n))

ikron_dense(n, mat) = kron(eye(n), mat)
ikron_lazy(n, mat) = kron(FillArrays.Eye(n), mat)

rikron_dense(n, mat) = kron(mat, eye(n))
rikron_lazy(n, mat) = kron(mat, FillArrays.Eye(n))

const nsmall = 2
const nbig = 100

const mat_small = rand(nsmall, nsmall)
const mat_big = rand(nbig, nbig);

macro sbtime(expr)
    quote
        println($(expr |> string))
        @btime $expr
    end
end

@sbtime ikron_dense(nbig, mat_big);
@sbtime ikron_lazy(nbig, mat_big);

println()

@sbtime rikron_dense(nbig, mat_big);
@sbtime rikron_lazy(nbig, mat_big);

println()

@sbtime ikron_dense(nsmall, mat_small);
@sbtime ikron_lazy(nsmall, mat_small);

println()

@sbtime ikron_dense(nsmall, mat_big);
@sbtime ikron_lazy(nsmall, mat_big);

println()

@sbtime rikron_dense(nsmall, mat_small);
@sbtime rikron_lazy(nsmall, mat_small);

println()

@sbtime rikron_dense(nsmall, mat_big);
@sbtime rikron_lazy(nsmall, mat_big);

println()

@sbtime ikron_dense(nbig, mat_small);
@sbtime ikron_lazy(nbig, mat_small);
ikron_dense(nbig, mat_big)
 157.041 ms (4 allocations: 763.02 MiB)
ikron_lazy(nbig, mat_big)
  209.153 ms (2 allocations: 762.94 MiB)

rikron_dense(nbig, mat_big)
  304.093 ms (4 allocations: 763.02 MiB)
rikron_lazy(nbig, mat_big)
  358.267 ms (2 allocations: 762.94 MiB)

ikron_dense(nsmall, mat_small)
  101.824 ns (2 allocations: 320 bytes)
ikron_lazy(nsmall, mat_small)
  61.975 ns (1 allocation: 208 bytes)

ikron_dense(nsmall, mat_big)
  41.494 μs (3 allocations: 312.69 KiB)
ikron_lazy(nsmall, mat_big)
  41.486 μs (2 allocations: 312.58 KiB)

rikron_dense(nsmall, mat_small)
  105.991 ns (2 allocations: 320 bytes)
rikron_lazy(nsmall, mat_small)
  68.820 ns (1 allocation: 208 bytes)

rikron_dense(nsmall, mat_big)
  60.869 μs (3 allocations: 312.69 KiB)
rikron_lazy(nsmall, mat_big)
  76.319 μs (2 allocations: 312.58 KiB)

ikron_dense(nbig, mat_small)
  64.835 μs (4 allocations: 390.78 KiB)
ikron_lazy(nbig, mat_small)
  59.712 μs (2 allocations: 312.58 KiB)
vtjnash commented 5 years ago

Stefan said above to do this, so I fixed the milestone.

JeffBezanson commented 5 years ago

Since there is no PR for this yet I think it needs to be moved to the next milestone.