JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.45k forks source link

Shape of transpose(x)' differs for real and complex x #35161

Open thisrod opened 4 years ago

thisrod commented 4 years ago

This is annoying if you run into it. I imagine that the best fix is a special method for the size of this complicated type of Complex array.

               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.0-rc2.0 (2020-02-24)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> x = randn(3);

julia> transpose(x)'
3-element Array{Float64,1}:
 -0.36345466121621794
  0.07867715859941832
  0.14644026210406283

julia> x = Complex.(x);

julia> transpose(x)'
3×1 LinearAlgebra.Adjoint{Complex{Float64},LinearAlgebra.Transpose{Complex{Float64},Array{Complex{Float64},1}}}:
 -0.36345466121621794 - 0.0im
  0.07867715859941832 - 0.0im
  0.14644026210406283 - 0.0im
KristofferC commented 4 years ago

Could you explain a bit more what you think the problem is? What did you want to happen?

StefanKarpinski commented 4 years ago

I'd guess that the unexpected part is that adjoint "undoes" the transpose in the real case but not in the complex case. For real vectors transpose and adjoint are the same operation which is its own inverse. That's not the case for complex vectors. A possible different behavior would be for a composition of an adjoint and a transpose to produce an eagerly conjugated vector.

KristofferC commented 4 years ago

I'd guess that the unexpected part is that adjoint "undoes" the transpose in the real case but not in the complex case.

Maybe unexpected but then unexpectedly nice? If you want a vector all the time just collect it?

thisrod commented 4 years ago

What did you want to happen?

I want a way to deduce size(transpose(x)') from size(x), without having to know eltype(x). I think those sizes should always be the same, but I care less about what the answer is and more about it being consistent.

The context is that I tested some code for Array{Complex}, then it failed when I inadvertently gave it an Array{Real}. The transpose and the adjoint were many lines of code apart.

While writing this code, I'm starting to suspect that identifying adjoint vectors with row matrices is a mistake: dual vectors should be vectors, and if you want a vector to turn into a matrix you should have to explicitly cast or reshape it. E.g:

julia> x = randn(5);

julia> x == Array(x)
true

julia> x' == Array(x')
true

julia> x'*x
2.9138488824321147

julia> Array(x') * Array(x)
1-element Array{Float64,1}:
 2.9138488824321147

I would prefer:

julia> x'
5-element DualVector{Float64}:
...

julia> Array(x')
5-element Array{Float64,1}:
...

julia> x' isa AbstractArray
# not sure about that

julia> x' == Array(x')
false

julia> x'*x
2.9138488824321147

julia> Array(x') * Array(x)
ERROR: MethodError: no method matching *(::Array{Float64,1}, ::Array{Float64,1})
...

julia> Matrix(x')
1×5 Array{Float64,2}:
...

julia> Matrix(x')*Array(x)
1-element Array{Float64,1}:
...

But that's a side track for the issue.

mcabbott commented 4 years ago

I'm starting to suspect that identifying adjoint vectors with row matrices is a mistake

It certainly creates lots of edge cases... I wasn't around but most other options got discussed in https://github.com/JuliaLang/julia/issues/4774

However I wonder whether there should be a lazy Conj wrapper, which does not change size, this would be the natural thing to return here.

StefanKarpinski commented 4 years ago

Why not just return an eagerly conjugated vector?

mcabbott commented 4 years ago

Someone could be relying on transpose(x)[1,1] = 0 updating x, I guess. Not sure I've ever knowingly done that.

mbauman commented 4 years ago

We did at one point have a lazy Conj wrapper in some of the initial sketch work... I don't recall all the considerations in deciding we didn't need it.

StefanKarpinski commented 4 years ago

I don't think being able to reliably mutate the original array through these kinds of mathy APIs is necessary. If you're doing this kind of operations it really implicitly treats the data as immutable mathematical objects rather than mutable containers.

thisrod commented 4 years ago

most other options got discussed in #4774

Thanks for that link. As it happens, the code I'm working on was inspired by up and down tuples. (Basic idea: an (n,m)-tensor maps m-dimensional arrays to n-dimensional ones.) I'm going to post it as soon as I tidy up the README, and ask for pointers to whatever wheel I'm reinventing. That thread might have some pointers.

StefanKarpinski commented 4 years ago

Redesigning the way dual vectors work is off the table until 2.0, but the fact that the shape of transpose(x)' depends on the element-type of x is not ideal, let's focus on that. I would propose that in the complex case it be modified to produce a conjugated vector (eagerly).

mbauman commented 4 years ago

I feel like sometimes-aliasing behavior is worse than sometimes-column-sometimes-vector.

StefanKarpinski commented 4 years ago

Then the alternative would be to re-introduce the lazy Conj wrapper type and return that.

mbauman commented 4 years ago

@andyferris what would you think about bringing back the lazy Conj wrapper? Do you remember the iterations here?

StefanKarpinski commented 4 years ago

The nice thing about adding a lazy Conj wrapper is that identity, adjoint, transpose and conj form a group (ℤ₂×ℤ₂); currently we're missing one element from the group.

thisrod commented 4 years ago

To play devil's advocate, there is a general problem with adding things: the number of edge cases increases geometrically with the number of things you add.

If conj was lazy, someone would have to specify the shape of transpose(x)*conj.(y) and so on. As I illustrated above, there is no way to specify that consistently, because the current implementation is not consistent. There might not be a more consistent alternative to making transpose(x)' an eager conj.

This won't be a common issue. As far as I'm concerned, it's fine if you decide this isn't worth fixing, and I add some if eltype(x) <: Real ... elseif eltype(x) <: Complex type logic where needed.

thisrod commented 4 years ago

Hi Andy, too :-) Numerical quantum atom optics is obviously a small world.

andyferris commented 4 years ago

Yes, we shouldn't have the behaviour of container properties like size depend on the eltype; these must be orthogonal concerns.

As @mbauman alluded to, one of the early implementations of this had a Conj, which was precisely to make identity, adjoint, transpose and conj into a lazily evaluated group of structure ℤ₂×ℤ₂. I think its going to be easiest to reason about if the lazy/eager structure followed that group structure - while making transpose(x)' an eager conj(x) might fix the size issue it doesn't really make that much sense to me. (Though it might be a worthwhile bugfix for now).

Bringing back Conj would be a good option in my opinion. Note that care needs to be taken to nest the lazy wrappers in the correct order and so-on, which is challenging when they end up either inside or outside other data structures, and with more different wrapper types it seemed to make dispatch a little complicated. It also needs plumbing into all the specializations in LinAlg, BLAS, the sparse library, etc - it was basically a bunch of work which I didn't quite complete (it was taking months and everything was shifting underneath); meanwhile through superhuman effort Matt got us over the line for v1 with the current implementation.

As an implementation note - I used to consider one of the ℤ₂ to be transpose and the other conj and that was reflected in my implementation - but I think after "taking matrix transposes seriously" enough I was convinced (thanks Steven) that thinking of the dichotomy as adjoint × conj was more natural/productive and it would be interesting to see if the implementation is simplified by having just Adjoint and Conj and Conj{<:Adjoint}. I suspect having the rule of always bringing these to the outside, and it that order, might make things easier for packages like static arrays, sparse arrays, etc, to correctly follow the dispatch logic and not make the mistake in the OP about size.

I can't see any other option that let's us use all the BLAS operations of adjoint/transpose/conjugation without extra copies. We might consider reducing laziness to fix size issues but it feels like a regression. Would anyone consider adding Conj and some of the dispatch around that as a breaking change?

Hi Andy, too :-) Numerical quantum atom optics is obviously a small world.

Oh, so you're that Rod, @thisrod. :laughing: Hi! Indeed it is.

StefanKarpinski commented 4 years ago

Since Adjoint, Transpose, Conj and identity are all of the non-identity elements of the group I don't quite see what the ordering issue is—any composition of these can always be reduced to a single one, so there's no ordering issue. Having them all be first-class instead of parameterizing one on the other seems like it would be simple enough. What does Conj{<:Adjoint} buy?

andyferris commented 4 years ago

My proposal was to make the group structure explicit - Conj is obviously cancels itself and is ℤ₂-like, as is Adjoint. Transpose is just "both" and could become a type alias rather than a struct. I.e. follow the principle of letting two simple things compose rather than worry about one complex thing.

Ordering them in a specific way is just one way of ensuring two Adjoints always cancel, for example, rather than getting Adjoint{<:Conj{<:Adjoint}} type things to worry about, and you wouldn't want to write special dispatch rules for both Conj{<:Adjoint{<:MyArrayType}} as well as Adjoint{<:Conj{<:MyArrayType}}. Dealing with ordering of lazy wrappers can become interesting with custom array types that may themselves contain arrays, for example, and it's nice to share a convention such as always pushing Adjoint towards the outside of such wrapper containers - which is important in avoiding the issue in the OP since we don't have a trait for "covector".

StefanKarpinski commented 4 years ago

Maybe I'm not making my point well enough, let me try to be more explicit. In addition to every element being its own inverse, the group has these relations:

There are no compositions of elements that don't reduce to one of the other elements. In particular, this means that you never need to nest these structures unless there's some other intervening abstraction that prevents collapsing them. The elements identity, Adjoint, Conj and Transpose aren't just the generators of the group, they are the whole group: there are no composite elements. In code, you can define

adjoint(a::Conj) = transpose(a.parent)
adjoint(a::Transpose) = conj(a.parent)
adjoint(a::Adjoint) = a.parent

conj(a::Conj) = a.parent
conj(a::Transpose) = adjoint(a.parent)
conj(a::Adjoint) = transpose(a.parent)

transpose(a::Conj) = adjoint(a.parent)
transpose(a::Transpose) = a.parent
transpose(a::Adjoint) = conj(a.parent)

There should never (unless you explicitly construct one) be a need for an Adjoint{<:Conj} array—that should become just Transpose. You can always collapse down to a single lazy transform. Perhaps you're arguing that instead of having Transpose, we should have Adjoint{<:Conj}. If so, that may be where we're disagreeing/disconnecting: it seems to me simpler, clearer, more symmetrical, more elegant, and to have far fewer (rather than more) corner cases, if all of the non-identity group elements have the same representation as a single lazy operation, rather than one arbitrarily chosen element of the group being represented as a composite.

Dealing with ordering of lazy wrappers can become interesting with custom array types that may themselves contain arrays, for example, and it's nice to share a convention such as always pushing Adjoint towards the outside of such wrapper containers - which is important in avoiding the issue in the OP since we don't have a trait for "covector".

I don't get why ordering is a concern at all: the above shows that there's only ever a single lazy layer if we have all three lazy transforms. Yes, there's an ordering concern when there are intervening abstractions, but we can't see through that because they are, you know, abstract. Those array types themselves have to decide and define appropriate methods to move these lazy operations in or out. But they should always do so using the function forms, adjoint, conj and transpose, specifically so that the above definitions can kick in and reduce a stack of these lazy transforms down to a single layer.

andyferris commented 4 years ago

Perhaps you're arguing that instead of having Transpose, we should have Adjoint{<:Conj}.

Yes, concretely that was my suggestion. (We can still export a Transpose type alias to make it less breaking).

If so, that may be where we're disagreeing/disconnecting: it seems to me simpler, clearer, more symmetrical, more elegant, and to have far fewer (rather than more) corner cases, if all of the non-identity group elements have the same representation as a single lazy operation, rather than one arbitrarily chosen element of the group being represented as a composite.

Yes this is the argument. :) After already attempting what you said during v0.7 development I was musing that perhaps decomposing the group of 4 into two seperate ℤ₂ wrappers would work out easier to implement and reason about, precisely with the major drawback that "one arbitrarily chosen element of the group being represented as a composite". Perhaps I am wrong but Julia is typically great at composing abstractions and felt it may be worth exploring.

I don't get why ordering is a concern at all: the above shows that there's only ever a single lazy layer if we have all three lazy transforms.

Yes I was specifically addressing the case we remove Transpose; otherwise there is no ordering to consider.

Yes, there's an ordering concern when there are intervening abstractions, but we can't see through that because they are, you know, abstract. Those array types themselves have to decide and define appropriate methods to move these lazy operations in or out.

Yes, this is totally true - though from experience it becomes difficult to cover all the corner cases. The wrapper author might know all about the AbstractArray interface from Base but never encountered (or cared about) LinearAlgebra - is there anything we can do to make this work automatically or without too much work by the package author? Alternatively, would removing Transpose mean that the author only has to special-case two things instead of three? Or would it be more complex?

thisrod commented 4 years ago

While everyone is thinking about this, here's the package I'm working on. I feel like it's ready for other people to look at, and I'd welcome any feedback before I announce it on Discourse.

StefanKarpinski commented 4 years ago

Using composition is generally a winning approach when it offers a multiplicative effect: we get every combination of things for free instead of having to handle each possible combination. In this case, it gives us exactly one thing "for free". There's no win, it's just makes things more complex for no advantage. That's why I keep harping on about how identity, Adjoint, Conj and Transpose are the entire group—because that means there is no multiplicative effect. It's way simpler and easier to just give that one thing its own name, especially since transposes are already a concept that have a name and which people are super familiar with!

As to corner cases, let's consider the two approaches with respect to that:

Add to that the fact that it's very common that Transpose should be handled specially anyway, and I just don't see what benefit expressing one of these three elements as a composition buys us. It seems to just add complexity with no upside.

thisrod commented 4 years ago

Has anyone considered this approach?

using LinearAlgebra: dot

struct RestridedView{T,N} <: AbstractArray{T,N}
    parent :: AbstractArray{T}
    dims :: Dims{N}
    strides :: Dims{N}
    offset :: Int
    conjugated :: Bool
end

Base.size(A::RestridedView) = A.dims

# Index linearly into the parent array
function Base.getindex(A::RestridedView, I...)
   x = A.parent[A.offset + dot(A.strides,I)]
   A.conjugated ? conj(x) : x
end

RestridedView(A::AbstractArray) = RestridedView(A, dso(size(A))..., false)

# dims, strides, offset
function dso(d)
    s = cumprod([1, d...])
    s = s[1:end-1]
    d, Tuple(s), 1-sum(s)
end

rv_conj(A::RestridedView) =
    RestridedView(A.parent, A.dims, A.strides, A.offset, !A.conjugated)
rv_conj(A) = rv_conj(RestridedView(A))

rv_permutedims(A::RestridedView, P) =
    RestridedView(A.parent, A.dims[P], A.strides[P], A.offset, A.conjugated)
rv_permutedims(A, P) = rv_permutedims(RestridedView(A), P)

# This only works for vectors, but it can be extended to the general case
getindex(A::RestridedView, I::StepRange) =
    RestridedView(A.parent, (length(I),), (I.step,), I.start-1, A.conjugated)

rv_view(A, I) = RestridedView(A)[I]

rv_reshape(A::AbstractArray, dims) = RestridedView(A, dso(dims)..., false)

I'd have to think more about reshape. It can be absorbed by RestridedView in the special case that strides[j+1] = strides[j]*(dims[j]+1), and no doubt in some other special cases.

I'm unsure how best to implement dual vectors.