JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.43k stars 5.45k forks source link

new syntax for transpose #21037

Closed StefanKarpinski closed 3 years ago

StefanKarpinski commented 7 years ago

Now that .op is generally the vectorized form of op, it's very confusing that .' means transpose rather than the vectorized form of ' (adjoint, aka ctranspose). This issue is for discussing alternative syntaxes for transpose and/or adjoint.

mbauman commented 6 years ago

I believe that as long as we have postfix ', folks are going to want to use it to broadcast f over a cartesian product of vectors with f.(v, w'). And folks are going to want to use it to reshape a vector of strings into a row-vector of headers for a table-like structure. So it's compelling to me to have a simple and easy-to-use replacement that we can direct them to.

Here's an option we haven't considered: A*' — a new bigraph. Typical math notation might interpret this as conj(A)', which is actually pretty close to what we want. It had been available on 0.6, but on 0.7 we allow using * to concatenate characters… still workable, though.

I don't believe postfix " and ` are available due to custom string literals parsing beyond the end of a line. Postfix * by itself is also unavailable for the same reason. Postfix prime A′ is probably one of the most commonly used unicode identifiers, so that's even more out than Aᵀ.

ChrisRackauckas commented 6 years ago

Honestly, after looking at my code, I don't use .' like at all, so transpose(a) is probably fine.

Note that I just grepped the registered packages and found 600+ usages of .', so it's not terribly rare.

Was this spot checked at all to see if .' wasn't used where ' would've been okay? I am starting to think that might be true more often than not. Otherwise, the only place where I have seen a legitimate use of .' was before Plots.jl labels would allow a vector (instead wanted a row vector of strings), but that's been changed. For codes where I really need this often, I think I'd start doing T = transpose locally, or throw a macro to change ' to transpose.

@transpose A = A'*A*B'*B*C'*C

would be fine with me for that rare case.

folks are going to want to use it to broadcast f over a cartesian product of vectors with f.(v, w'). And folks are going to want to use it to reshape a vector of strings into a row-vector of headers for a table-like structure. So it's compelling to me to have a simple and easy-to-use replacement that we can direct them to.

If it only shows up once in a statement, isn't it okay to just use transpose?

StefanKarpinski commented 6 years ago

The a*' syntax for conjugate-adjoint is pretty nice, although it doesn't really seem like that's the operation we need a better syntax for. &a will be available soon and suggests swapping things, although it's quite different from traditional notations for this.

mbauman commented 6 years ago

Maybe it's time for a straw poll?

How should we spell the structural transpose?

(roughly in order of proposal; no judgements on emoji names here)

andyferris commented 6 years ago

The LinAlg discussions did indeed talk about gifting .’ to non-recursive transpose usage, since conj(x’) is relatively uncommon. However is mathematical syntax and really should take the mathematical meaning (if anything).

ttparker commented 6 years ago

Very strongly oppose making tr(A) mean matrix transpose - everyone's going to think it means matrix trace: https://en.wikipedia.org/wiki/Trace_(linear_algebra)

Liso77 commented 6 years ago

If not deprecate superscripts as identifier (which probably has to be considered seriously before 1.0) then ᵀ(A) is possibility too.

Jutho commented 6 years ago

In relation to the suggestion (A)ᵀ, my apologies for slightly derailing this discussion with the following remark:

I have never cared very much for having available as unary operator, especially since you'll anyway end up typing √(...) as soon as you want to apply it to a variable that is more than a one or a few characters. Furthermore, I have always found the difference in functioning between and √a very artificial. It probably makes sense if you know about Unicode classes etc, but to anyone else this must seems absurd. Sure its useful to have as a valid variable name, but similarly √a could be a useful variable name to store the square root of a if you need to use it several times. Or more complicated expressions like a²b and its square root a√b, where the former is a valid identifier and the latter is not. Above all, I like consistency.

So for consistency, I like the proposal of having postfix operators when using parentheses (A)ᵀ, (a)², in combination with removing the Unicode unary operator (and its relatives) so that it can also be used in identifiers (while still accessible as normal function call √(a)).

StefanKarpinski commented 6 years ago

I agree 100% with what @Jutho said and have thought it on a number of occasions. Would you mind opening an issue, @Jutho? Proposal: allow in identifier names, require √(x) for calling as op.

Liso77 commented 6 years ago

next question -> what about 2 |> √ ?

StefanKarpinski commented 6 years ago

Let's have discussion of in another thread, but in short 2 |> √ means √(2).

stevengj commented 6 years ago

Another alternative, which would require no parser changes and be easy to type, would be A^T for transpose (by defining T to be a singleton type with a ^ method). … oh, I see that @mbauman had this idea too. It's a bit ugly, but no more so than A.'.

mahiki commented 6 years ago

I don't have expertise, but I'm very invested in the outcome of this discussion as who will be likely typing thousands of lines containing matrix expressions in their course of my work.

transpose(A) # with no special syntax is winning the vote above, but is painful to my eyes and fingers.

In python the common usage is probably with numpy and a lot of stuff that looks like the following and is not too bad:

import numpy as np
# define matrix X of n columns, with m rows of observations
error = X.dot(Theta.T) - Y
gradient = (1 / m) * (X.dot(Theta.T) - Y).T.dot(X)

I wouldn't want to have to do:

grad = 1/m * transpose(X * transpose(Theta) - Y)) * X

It totally changes the mental conception of transpose from the convention that mathematics notation has settled on, which is a postfix signifier, usually Aᵀ or Aᵗ.

Personally I'm very happy with A' which works in Julia v.0.6, before it got taken by adjoint. Is adjoint used very often?

Here are my comments in a table:

Aᵀ or Aᵗ    if the world won't accept unicode operators, let them use transpose(A)
A'          close to math notation, easy to type and *especially* easy to read
A^'         this could signal `^` not to be parsed as Exponentiation.
A.'         conflicts with dotted operator syntax, but at face value OK
A^T or A^t  these are pretty good, but what if variable `T` is meant to be an exponent? 
A.T         same as numpy, same dotted operator collision
t(A)        nesting reverses semantics, 3 keystrokes and two of them with shift key.
transpose(A) with no special syntax     # please don't do this.
ararslan commented 6 years ago

Personally I'm very happy with A' which works in Julia v.0.6, before it got taken by adjoint. Is adjoint used very often?

I don't understand, A' has always been the adjoint of A. We used to call the underlying function ctranspose for conjugate transpose, but we renamed it to the equivalent term adjoint with no change in functionality.

If you're doing linear algebra then you're much more likely to want a conjugate transpose anyway, so you'll be typing A' rather than transpose(A). The popularity of not defining a special syntax for non-conjugate transposes is (presumably) due in part to the fact that it's really not that common for most linear algebraic uses to want the non-conjugate transpose.

Liso77 commented 6 years ago

If you're doing linear algebra then ...

If your tool is hammer then ... :)

... you have to think about possibility that Julia could grow to general programming language.

Maybe not, maybe it will stay to be linear algebra argot - which is possibility which have to think about programmers like me. :)

StefanKarpinski commented 6 years ago

@mahiki, you're NumPy example:

import numpy as np
# define matrix X of n columns, with m rows of observations
error = X.dot(Theta.T) - Y
gradient = (1 / m) * (X.dot(Theta.T) - Y).T.dot(X)

would be written literally in Julia as:

error = X*Θ' - Y
gradient = (1/m) * (X*Θ' - Y)' * X

or assuming that vectors are rows in that NumPy example and would be columns in Julia:

error = X'Θ - Y
gradient = (1/m) * (X'Θ - Y) * X'

which seems about as clear and mathematical as it gets. If your data is real, then adjoint and transpose are the same operation, which may be why you're using transpose above – but mathematically adjoint is the right operation. As @ararslan said, X' has always meant adjoint in Julia (and in Matlab as well). It was previously called ctranspose short for "conjugate transpose" but that name was a misnomer since the defining property of the operator is that

dot(A*x, y) == dot(x, A'y)

which is the defining property of the Hermitian adjoint but happens to be satisfied by the conjugate transpose when A is a complex matrix. That's why "adjoint" is the right generic term for this operator.

All that said, I voted above for both transpose(a) and a.' since I think it would be fine for a.' to mean structural transpose. It would work as expected, and even though it would not be recursive and therefore not "mathematically correct" in some generic code, having it work as expected seems good enough. And telling people to consider using conj(a') in generic code seems like an education thing rather than something we really need to hit people over the head with.

ttparker commented 6 years ago

@mahiki If for some reason you really do need to use transpose instead of adjoint many times in your code, then you could define a shorter macro like @t that aliases transpose (although I know this solution isn't ideal, especially if you're writing your code with other people).

ararslan commented 6 years ago

you have to think about possibility that Julia could grow to general programming language.

@Liso77 It already is. As just one of many examples, Nanosoldier runs a web server that listens for GitHub events and runs performance benchmarks on demand, all in Julia. That's a digression though, and I don't want this thread to get off topic.

If you're transposing a matrix of some kind with non-numeric data—which is an entirely valid use case—mathematical transpose notation actually seems like a bad pun. In that case I feel that it would be better to be more explicit about what you're asking for, e.g. transpose (or even permutedims, depending on your specific needs).

stevengj commented 6 years ago

If you're transposing a matrix of some kind with non-numeric data—which is an entirely valid use case—mathematical transpose notation actually seems like a bad pun.

Since A.' is not really "mathematical transpose notation" in the usual sense, I don't see that this is an argument for or against.

ttparker commented 6 years ago

I think that @ararslan isn't arguing against the existing .' but rather against introducing a superscript-T syntax. I tend to agree - if you mean the linear-algebra concept of adjoint, then you should use ' (even if your matrix happens to be real). And if you have a matrix of non-numeric data, then it's of course perfectly legitimate to permute the two indices, but this operation isn't really the "transpose" as we usually think of it, and using mathematical superscript-T notation is probably more likely to confuse than to clarify. The only situation where a superscript-T notation would really be appropriate is if you have a numerical matrix whose indices you want to permute, but you really don't want the adjoint linear operator. Such situations certainly exist, but may be too rare to justify introducing new syntax.

Sacha0 commented 6 years ago

Ref. https://github.com/JuliaLang/julia/issues/20978#issuecomment-315902532. Best!

Liso77 commented 6 years ago

... but this operation isn't really the "transpose" as we usually think of it, ...

If this is so unusual why ararslan and many others vote for spelling structural transpose as transpose(A)?

mahiki commented 6 years ago

Thank you @StefanKarpinski @ararslan @ttparker. I had to go back to my linear algebra text and rediscover the adjoint, its in there all right. I took that before Complex Analysis, probably why I took no notice of it.

I love being able to do this gradient = (1/m) * (X'Θ - Y) * X'

My confusion stems from the widespread use of 'transpose' (as a superscript T) in reference docs, papers, textbooks, etc, for example Andrew Ng's stanford CS229 lecture notes, where the corresponding Julia code would use adjoint as in @StefanKarpinski 's clean-looking example above. This is because adjoint and transpose are equivalent in ℝ (right?). update: yup

Now my favored notation for transpose is simply whatever is logically consistent. Clearly .' is not because of conflict with dotted operator syntax, and I have no objection to transpose(A) with no special syntax, since no sane special syntax seems available, except for a unicode superscript.

I like @ttparker solution if I do find myself writing a lot of transpose, macro @t that aliases transpose.

Again, I was mistaken to state:

transpose(A) with no special syntax # please don't do this.

Thank you for taking my comments seriously despite my poor facility with graduate level mathematics.

perrutquist commented 6 years ago

(From discourse.)

I'd like ' to be a post-fix operator that maps f' to '(f) where Base.:'(x::AbstractMatrix) = adjoint(x) and the user is free to add other methods that have nothing to do with adjoints. (For example, some people might like f' to refer to df/dt.)

With the operator suffixes introduced in 0.7, it would then be natural for f'ᵃ to map to 'ᵃ(f), and so on, allowing the user to define his own postfix operators. This would make it possible to have Base.:'ᵀ(x::AbstractMatrix) = transpose(x) and Base.:'⁻¹(x::Union{AbstractMatrix,Number}) = inv(x), etc.

Writing A'ᵀ is perhaps not as clean as Aᵀ, but it wouldn't require deprecating variable names ending in .

mbauman commented 6 years ago

At first glance that looks like it could be a non-breaking feature. It's a very clever compromise. I like it.

StefanKarpinski commented 6 years ago

Seems reasonable to me. The hardest part is coming up with a name for the ' function—prefix syntax does not work in this case.

jrevels commented 6 years ago

The hardest part is coming up with a name for the ' function

apostrophe? Might be too literal...

mbauman commented 6 years ago

Is it at all possible to make prefix syntax work (e.g., with an explicit (')(A) syntax?)? If not, then that's a problem as it'd break the if-you-can-define-the-symbol-name-then-you-can-override-its-syntax rule introduced by https://github.com/JuliaLang/julia/pull/26380.

Edit: looks to be available:

julia> (')(A)

ERROR: syntax: incomplete: invalid character literal

julia> (')(A) = 2

ERROR: syntax: incomplete: invalid character literal
JeffBezanson commented 6 years ago

Unfortunately ' is one of the hardest characters to use as an identifier name, since it introduces a different kind of atom (characters), which has very high precedence (equal to the precedence of identifiers themselves). For example, is (')' an application of ' to itself, or an open paren followed by a ')' literal?

One option that's not practical in the short term is to declare character literals to be not worth ', and use a string macro like c"_" instead.

perrutquist commented 6 years ago

How about if ' parses as an identifier when preceded by dot-colon, so that Base.:' would work?

Of course (@__MODULE__).:'(x) = function_body might be a bit cumbersome to write, but (x)' = function_body should work the same. Edit: No, since (x)' should map to calling the ' in Base. Defining a ' function in the current module would be cumbersome, but there wouldn't be any reason to do it either.

perrutquist commented 6 years ago

Or how about letting '' parse as the identifier ' when it would otherwise have parsed as an empty character literal (which is currently a parsing-level error). Similarly, ''ᵃwould parse as the identifier 'ᵃ, etc.

Everything that is not currently a syntax error would still parse as before (for example 2'' is postfix ' applied twice to 2), but 2*'' would now parse as two times '.

StefanKarpinski commented 6 years ago

It seems confusing that we would have a'' === a but ''(a) === a'. It seems better to use Base.apostrophe as the name instead (or something like that).

ttparker commented 6 years ago

Might it be better to split this discussion off into a new Github issue, since it's about ' syntax that's not directly related to matrix transposition?

perrutquist commented 6 years ago

Is there an automated way of splitting issues, or should I simply open a new one and link to the discussion here?

ararslan commented 6 years ago

The latter

briochemc commented 6 years ago

The only situation where a superscript-T notation would really be appropriate is if you have a numerical matrix whose indices you want to permute, but you really don't want the adjoint linear operator. Such situations certainly exist, but may be too rare to justify introducing new syntax.

I guess I'm way too late for the discussion, but I'd like to point one use that I think is worth mentioning: Applying the complex-step differentiation to a real-valued function which has transpose inside of it. (I personally figured out that I needed .' in MATLAB and julia for this particular reason.)

I'll give an example with multiple occurences of transpose (maybe I could avoid doing it this way?)

using LinearAlgebra

# f : Rⁿ → R
#     x  ↦ f(x) = xᵀ * x / 2
f(x) = 0.5 * transpose(x) * x

# Fréchet derivative of f
# Df : Rⁿ → L(Rⁿ, R)
#      x  ↦ Df(x) : Rⁿ → R (linear, so expressed via multiplication)
#                   h  ↦ Df(x)(h) = Df(x) * h
Df(x) = transpose(x) 

# Complex-step method version of Df
function CSDf(x) 
    out = zeros(eltype(x), 1, length(x))
        for i = 1:length(x)
        x2 = copy(x) .+ 0im
        h = x[i] * 1e-50
        x2[i] += im * h
        out[i] = imag(f(x2)) / h
    end
    return out
end

# 2nd Fréchet derivative
# D2f : Rⁿ → L(Rⁿ ⊗ Rⁿ, R)
#       x  ↦ D2f(x) : Rⁿ ⊗ Rⁿ → R (linear, so expressed via multiplication)
#                     h₁ ⊗ h₂ ↦ D2f(x)(h₁ ⊗ h₂) = h₁ᵀ * D2f(x) * h₂
D2f(x) = Matrix{eltype(x)}(I, length(x), length(x))

# Complex-step method version of D2f
function CSD2f(x)
    out = zeros(eltype(x), length(x), length(x))
    for i = 1:length(x)
        x2 = copy(x) .+ 0im
        h = x[i] * 1e-50
        x2[i] += im * h
        out[i, :] .= transpose(imag(Df(x2)) / h)
    end
    return out
end 

# Test on random vector x of size n
n = 5
x = rand(n)
Df(x) ≈ CSDf(x)
D2f(x) ≈ CSD2f(x)

# test that the 1st derivative is correct Fréchet derivative
xϵ = √eps(norm(x))
for i = 1:10
    h = xϵ * randn(n) # random small y
    println(norm(f(x + h) - f(x) - Df(x) * h) / norm(h)) # Fréchet check
end

# test that the 2nd derivative is correct 2nd Fréchet derivative
for i = 1:10
    h₁ = randn(n) # random h₁
    h₂ = xϵ * randn(n) # random small h₂
    println(norm(Df(x + h₂) * h₁ - Df(x) * h₁ - transpose(h₁) * D2f(x) * h₂) / norm(h₂)) # Fréchet check
end
# Because f is quadratic, we can even check that f is equal to its Taylor expansion
h = rand(n)
f(x + h) ≈ f(x) + Df(x) * h + 0.5 * transpose(h) * D2f(x) * h

The point being that f and Df must be defined using transpose and must not use the adjoint.

c42f commented 6 years ago

I don't think the complex step method is super relevant in julia. Isn't it a neat hack/workaround to get automatic differentiation in cases where a language supports efficient builtin complex numbers, but an equivalently efficient Dual number type can't be defined? That's not the case in julia, which has really nice automatic differentiation libraries.

briochemc commented 6 years ago

I agree about using Dual numbers instead of the complex-step method and that's a very good point you are making (I personally have already replaced all my complex-step-method evaluations with dual-number ones in julia). However, I do think that this is still a valid use case, for demonstration purposes, teaching tricks (see, e.g., Nick Higham talking about the complex-step method at Julia Con 2018), and portability (in other words, I worry that MATLAB's version of the code above using complex numbers would be cleaner).

mattcbro commented 6 years ago

Coming from the world of Engineers and possibly Physicists who use complex arrays more than real arrays, not having a transpose operator is a bit of a pain. (Complex phasor representation for a harmonic time dependency is ubiquitous in our field.) I personally would favor the numpy syntax of x.H and x.T, though my only consideration is conciseness .

The density of the transpose operator relative to Hermitian transpose is about 1 to 1 in my code. So the unconjugated transpose is equally important to me. A lot of the use of transpose is to create outer products and to size arrays correctly for interfacing to other code or for matrix multiplication.

I intend for now to simply provide a macro or one character function for the operation, however what is the proper equivalent to the old functionality, transpose() or permutedims()?

andyferris commented 6 years ago

transpose is intended for linear algebra and is recursive, and permutedims is for non-recursive arrangement of data of any type.

It’s interesting you say you use transpose as much as adjoint. I used to be the same, but mostly because I tended to make mistakes where my data was real so I tended to transpose but actually the adjoint was the correct operation (generalized to the complex case - adjoint was the right operation for my algorithm). There are (many) valid exceptions, of course.

Jutho commented 6 years ago

In everything related to electrodynamics, you often use space-like vectors and want to use vector operations in R^n (typically n=3), i.e. transpose in particular, even though your vectors are complex-valued because you've taken a Fourier transform. It seems like @mattcbro is talking about this kind of applications.

That being said, when reading up on syntax discussions, I am often contemplating that for me, personally, I could not imagine that a slightly more verbose syntax is the thing that slows down my programming speed or efficiency. Thinking about the algorithm itself and the most natural/efficient way of implementing it takes much more time.

stevengj commented 6 years ago

In everything related to electrodynamics, you often use space-like vectors and want to use vector operations in R^n (typically n=3), i.e. transpose in particular, even though your vectors are complex-valued because you've taken a Fourier transform.

Not necessarily. Often you want time-average quantities from the Fourier amplitudes, in which case you use the complex dot product, e.g. ½ℜ[𝐄*×𝐇] is the time-average Poynting flux from the complex Fourier components and ¼ε₀|𝐄|² is a time-average vacuum energy density. On the other hand, since the Maxwell operator is (typically) a complex-symmetric operator ("reciprocal"), you often use an unconjugated "inner product" for (infinite-dimensional) algebra on the fields 𝐄(𝐱) etc. over all space.

Jutho commented 6 years ago

That's true, I had the word often in the first sentence, but removed it apparently :-).

mattcbro commented 6 years ago

Well if you want to go there, Electromagnetic quantities are even more concisely written in a Clifford Algebraic formulation, often called Geometric algebra. Those algebras have multiple automorphisms and antiautomorphisms that play a critical role in the formulation of the theory, especially when considering scattering problems.

These algebras typically have a concise matrix representation and those morphisms are often easily computed via complex transpose, Hermitian transpose and conjugation.

Nevertheless as I stated earlier my primary use of transpose is often to arrange my arrays to interface with other arrays, other code, and to get matrix multiply to work against the correct dimension of a flattened array.

c42f commented 6 years ago

I personally would favor the numpy syntax of x.H and x.T

Easy to implement now in 1.0, and should be efficient:

function Base.getproperty(x::AbstractMatrix, name::Symbol)
    if name === :T
        return transpose(x) 
    #elseif name === :H # can also do this, though not sure why we'd want to overload with `'`
    #    return adjoint(x)
    else
        return getfield(x, name)
    end
end 

This is surprisingly easy and kind of neat. The downside seems to be that orthogonal uses of getproperty don't compose with each other. So anybody implementing getproperty on their specific matrix type will need to implement the generic behavior by hand.

c42f commented 6 years ago

orthogonal uses of getproperty don't compose

Hmm. I wonder if that implies x.T "should" have been lowered to getproperty(x, Val(:T)). I shudder to think what that would do to the poor compiler though.

andyferris commented 6 years ago

I’m sure everyone has their opinion - but to me it’s almost a feature that it’s hard to build a generic interface out of dot syntax. Don’t get me wrong, it’s a really great feature and wonderful for defining named tuple-like structs, and so-on.

(It’s also possible to add a Val dispatch layer to your types pretty easily).

mattcbro commented 6 years ago

@c42f 's code works like a charm. Unfortunately for me I'm trying to write code that works on versions 0.64 and up, which forces me to use either transpose or my own defined function T(A) = transpose(A). Perhaps a macro would have been a little cleaner and slightly more efficient.

c42f commented 6 years ago

To be clear, I'm not suggesting defining this particular getproperty is a good idea for user code. It's just likely to break things in the longer term ;-) Though perhaps one day we'll have a good enough feel for the consequences that we could have x.T defined in Base.

But in a general sense, I do wonder why this kind of property use for defining "getters" in generic interfaces is actually bad. For example, generic field getter functions currently have a whopping big namespace problem which is simply solved by judicious use of getproperty. It's far nicer to write x.A than to write MyModule.A(x), some longer ugly function name like get_my_A(x), or to export the extremely generic name A from a user module. The only problem as I see it, is the expected ability to override the meaning of .B for subtypes independently of .A being defined generically on a super type. Hence the half serious comment about Val.

StefanKarpinski commented 6 years ago

Funny idea:

julia> x'̄
ERROR: syntax: invalid character "̄"

The character looks kind of like a T but it's actually a ' with a bar over it. Not sure if serious...