JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.45k forks source link

new syntax for transpose #21037

Closed StefanKarpinski closed 3 years ago

StefanKarpinski commented 7 years ago

Now that .op is generally the vectorized form of op, it's very confusing that .' means transpose rather than the vectorized form of ' (adjoint, aka ctranspose). This issue is for discussing alternative syntaxes for transpose and/or adjoint.

mbauman commented 7 years ago

Andreas tried Aᵀ (and maybe Aᴴ) in #19344, but it wasn't very well received. We could similarly pun on ^ with special exponent types T (and maybe H) such that A^T is transpose, but that's rather shady, too. Not sure there are many other good options that still kinda/sorta look like math notation.

StefanKarpinski commented 7 years ago

I kind of think that t(A) might be the best, but it's unfortunate to "steal" another one-letter name.

nalimilan commented 7 years ago

Moving my comment from the other issue (not that it solves anything, but...):

+1 for using something else than .'.

I couldn't find languages with a special syntax for transposition, except for APL which uses the not-so-obvious , and Python which uses *X (which would be confusing for Julia). Several languages use transpose(X); R uses t(X). That's not pretty, but it's not worse than .'. At least you're less tempted to use ' by confusing it with .': it would be clear that these are very different operations.

See Rosetta code. (BTW, the Julia example actually illustrates conjugate transpose...)

mauro3 commented 7 years ago

Could one of the other ticks be used? ` or "

ararslan commented 7 years ago

-100 to changing adjoint, since it's one of the awesome things that makes writing Julia code as clear as writing math, plus conjugate transpose is usually what you want anyway so it makes sense to have an abbreviated syntax for it.

As long as we have the nice syntax for conjugate transpose, a postfix operator for regular transpose seems mostly unnecessary, so just having it be a regular function call seems fine to me. transpose already works; couldn't we just use that? I find the t(x) R-ism unfortunate, as it's not clear from the name what it's actually supposed to do.

Using a different tick would be kind of weird, e.g. A` can look a lot like A' depending on the font, and A" looks too much like A''.

stevengj commented 7 years ago

If we make the change in #20978, then a postfix transpose actually becomes more useful than it is now. e.g. if you have two vectors x and y and you want to apply f pairwise on them, you can do e.g. f.(x, y.') ... with #20978, this will be applicable to arrays of arbitrary types.

Honestly, I think our best option is still to leave it as-is. None of the suggestions seem like a clear improvement to me. .' has the advantage of familiarity from Matlab. The . actually is somewhat congruent with dot-call syntax in examples like f.(x, y.'), and suggests (somewhat correctly) that the transpose "fuses" (it doesn't produce a temporary copy thanks to RowVector and future generalizations thereof).

In fact, we could even take it further, and make f.(x, g.(y).') a fusing operation. i.e. we change .' transpose to be non-recursive ala #20978 and we extend its semantics to include fusion with other nested dot calls. (If you want the non-fusing version, you would call transpose.)

StefanKarpinski commented 7 years ago

I like that plan a lot, @stevengj.

StefanKarpinski commented 7 years ago

One wrinkle: presumably the @. macro does not turn y' into y.' (since that would be wrong). It could, however, turn y' into some kind of fused adjoint operation.

stevengj commented 7 years ago

The main problem is finding a clean way to make f.(x, g.(y).') have fusing semantics. One possibility would be to transform it to f.(x, g.(y.')) and hence to broadcast(x,y -> f(x, g(y)), x, y.')?

Note that, for this to work properly, we might need to restore the fallback transpose(x) = x method, in which case we might as well let transpose remain recursive.

StefanKarpinski commented 7 years ago

I think deciding whether transpose should be recursive or not is orthogonal to whether we make it participate in dot syntax fusion. The choice of making it non-recursive is not motivated by that.

stevengj commented 7 years ago

@StefanKarpinski, if restore a fallback transpose(x) = x, then most of the motivation for changing it to be non-recursive goes away.

jebej commented 7 years ago

What's the problem if the fallback is restored but we still have the transpose be non-recursive?

stevengj commented 7 years ago

@jebej, recursive transpose is more correct when it is used as a mathematical operation on linear operators. If I remember correctly, the main reason for making it non-recursive was so that we don't have to define the transpose(x) = x fallback, rather than throwing a MethodError.

But it would not be terrible to have the fallback but still be non-recursive.

bkamins commented 7 years ago

Let me add two comments (I have looked through the earlier discussion and did not notice them - sorry if I have omitted something):

andreasnoack commented 7 years ago

What is your use case for transposing a vector of strings?

bkamins commented 7 years ago

Consider the following scenario for instance:

x = ["$(j+i)" for j in 1:3, i in 1:5]
y = ["$i" for i in 5:9]

and I want to append y after the last row of x. And the simplest way is to vcat a transpose of y.

Comes up in practice when incrementally logging text data to a Matrix{String} (I could use Vector{Vector{String}}), but often matrix is more useful (or then again there is a question how to convert Vector{Vector{String}} to Matrix{String} by vertically concatenating consecutive elements).

mbauman commented 7 years ago

Another use-case: transposing is the simplest way to make two vectors orthogonal to each other in order to broadcast a function over the cartesian product (f.(v, w.')).

Sacha0 commented 6 years ago

Data point: Yesterday I encountered a party confused by the postfix "broadcast-adjoint" operator and why it behaves like transpose. Best!

ttparker commented 6 years ago

FWIW, I strongly feel that we should get rid of the .' syntax. As someone more familiar with Julia than with Matlab, I expected it to mean vectorized adjoint and I got really tripped up when it didn't. Julia isn't Matlab and shouldn't be bound by Matlab's conventions - if in Julia, a dot means vectorization of the adjacent function, then this should be consistent across the language and shouldn't randomly have the one horrible exception that .' is formally unrelated to '.

I think it's fine to just have transpose without any special "tick" notation, since the vast majority of the time, it's called on a matrix of real numbers, so ' would be equivalent if you really want to save typing. If we want to make a fusing version of transpose, then I really don't think that .' is the right syntax.

JeffBezanson commented 6 years ago

That's a good point. Arguably only the adjoint needs super-compact syntax.

StefanKarpinski commented 6 years ago

Let's just call this transpose and deprecate .'. In the future, we can consider if we want .' as pointwise adjoint or if we just want to leave it perma-deprecated to avoid trapping Matlab users.

stevengj commented 6 years ago

Note that I just grepped the registered packages and found 600+ usages of .', so it's not terribly rare. And with dot calls / broadcast (which only in 0.6 began to fully handle non-numeric data), the desire to lazily transpose non-numeric arrays (where adjoint makes less sense) will probably become much more common, so the argument for a compact syntax is somewhat strengthened.

ttparker commented 6 years ago

Then we'd better deprecate .' as soon as possible, before more code get trapped in a bad usage pattern.

stevengj commented 6 years ago

Why is it bad?

StefanKarpinski commented 6 years ago

The problem is that .' now doesn't mean what it seems to mean as dotted operator.

ttparker commented 6 years ago

As I said above, because it violates the general pattern that . means vectorization, and looks like it means vectorized adjoint (especially to someone who's not familiar with Matlab).

mbauman commented 6 years ago

I think @stevengj makes a good point — this is tied to the desire for a simple non-recursive transpose.

I know it was unpopular, but I'm starting to favor Andreas' #19344 for . At this point, I'd favor deprecating the use of all superscripts as identifiers, and interpret any trailing superscripts as postfix operators. This also gives a path towards resolving some of the kludginess around literal_pow using superscript numbers. Yes, it'd be sad to lose χ² and such as variable names, but I think the benefits would outweigh the downsides.

fredrikekre commented 6 years ago

At this point, I'd favor deprecating the use of all superscripts as identifiers, and interpret any trailing superscripts as postfix operators.

RIP my code screenshot from 2017-11-09 22-08-25

JeffBezanson commented 6 years ago

At this point, I'd favor deprecating the use of all superscripts as identifiers

I really don't think that would be necessary, when we just want T and maybe a couple other things in the future.

stevengj commented 6 years ago

A foolish consistency…

Yes, it's slightly inconsistent to use .' for transpose, but all of the alternatives proposed so far seem to be worse. It's not the worst thing in the world to say ".' is transpose, an exception to the usual rule about dot operators." You learn this and move on.

ararslan commented 6 years ago

One thing to note that may help with any potential confusion over .' not being a dot broadcast is that it's a postfix operator, whereas prefix broadcasting is op. and infix is .op. So we can say that . doesn't mean broadcast when it's postfix. The other use of postfix . is field lookup, and getfield(x, ') doesn't make sense, so it's distinct from other meanings.

(That said, I favor transpose(x) over keeping .'.)

ttparker commented 6 years ago

@stevengj I would bet that many (perhaps most) of the 600+ uses of .' in the registered packages that you mentioned above could be replaced by ' at no cost to readability, and the code would continue to work.

andyferris commented 6 years ago

Possibly not popular, but there could still be postfix " and `?

uses of .' in the registered packages that you mentioned above could be replaced by ' at no cost to readability, and the code would continue to work.

Note that once #23424 lands, we will be able to use transpose on arrays of strings and so-on, but not adjoint. Best practice for linear algebra use of x.' will most likely become something like conj(x') (hopefully this is lazy, i.e. free). While I love using .' for its compactness, perhaps getting rid of it will force linear algebra users to use the correct thing and arrays-of-data users to use spelled-out transpose.

c42f commented 6 years ago

there could still be postfix " and `?

New syntax for transpose() seems rather premature. IMHO it would be better to just deprecate .' to be replaced as you suggest with conj(x') and transpose as required.

I have a feeling that .' is so useful in matlab mainly because of the matlab insistence that "everything is a matrix" along with the lack of coherent slicing rules such that you often need to insert random transposes in various places to get things to work.

StefanKarpinski commented 6 years ago

To summarize the arguments here:

  1. .' is now the lone standout as a dotted operator that doesn't mean "apply undotted operator elementwise"; new users not coming from Matlab find this to be a surprising trap.

  2. .' is now effectively ambiguous: did you mean transpose or did you mean conj(x')? In principle, every legacy usage of .' should be vetted to determine whether it's permuting the indices of a 2-dimensional array or whether it's doing an "unconjugated adjoint".

The first issue is problematic but not fatal by itself; the second issue is the really bad one – this is no longer a single coherent operation, but rather it will be split into two separate meanings.

StefanKarpinski commented 6 years ago

I just noticed that if we ever changed .' to mean "elementwise adjoint", then conj(x') would be roughly equivalent to x'.' and conj(x)' would roughly be x.'' which is sooo close to x.' 😬.

ChrisRackauckas commented 6 years ago

Possibly not popular, but there could still be postfix " and `?

Copy pasting code into Slack and seeing that destroy syntax highlighting would be...

ChrisRackauckas commented 6 years ago

Being able to transpose anything is nice because it makes it easy to "cross product" via the dispatch mechanism, and other short concise use cases like that. The issue with not having an easy fallback for this kind of stuff is that invariably the hack that we'll see is to just define transpose(x) = x fallbacks (or on Base types, so type-piracy in packages) to make this kind of thing work easily. That makes me think: why isn't Complex the the odd one? Adjoint of most numbers is itself, so adjoint of complex is the one to specialize on: can't that be extended beyond numbers?

I see two very related things here:

1) x' doesn't work for non-number types, so we want a way to easily do this for other data 2) transpose(x) is not as simple as x.'. This is mostly for the cases of (1), since the use cases for transposing complex matrices are much more rare.

But instead of going down (2), why not try and do a reasonable fix for (1)?

Maybe a reasonable fix is just a macro that makes ' mean transpose instead of adjoint?

StefanKarpinski commented 6 years ago

But instead of going down (2), why not try and do a reasonable fix for (1)?

We've already been down that path and several adjacent to it. There's been a large amount of resulting discussion which perhaps someone else can distill, but in summary, it doesn't work out well. Fundamentally, the mathematical adjoint operation does not make sense on things that are not numbers. Using ' on non-numbers just because you like the terse syntax is bad – it's the worst kind of operator punning and it shouldn't be surprising that bad things ensue from this kind of abuse of meaning. The adjoint function should only be defined on things it makes sense to take the adjoint of and ' should only be used to mean that.

Remember that .' as currently used is fundamentally two different operations: array transposition and non-conjugate adjoint. The recursive transpose problem highlights the fact that these are different operations and that we therefore need different ways to express them. The mathy folks seem adamant that the non-conjugate adjoint operation is (a) important, and (b) different from simple swapping of dimensions. In particular, to be correct, non-conjugate adjoint should be recursive. On the other hand, swapping dimensions of a generic array should clearly not be recursive. So these operations need to be written differently, and existing usages of .' need to be disambiguated as having one meaning or the other. Deprecating .' is a way to force this.

Finally, while I feel strongly that permutedims(x, (2, 1)) is definitely too inconvenient for swapping the dimensions of a 2d array, I find the argument that transpose(x) is too inconvenient unconvincing. Is this operation so common that having a simple, clear function name for it is too much? Really? Is swapping the dimensions of an array that much more common or important than all the other things in the language that we use function names and function call syntax for? Householder notation does make adjoint quite special since we want to write things like v'v, v*v' and v'A*v. That's why adjoint gets really nice syntax. But swapping the dimensions of an array? It does not warrant an operator in my opinion.

rfourquet commented 6 years ago

Not a strong argument, but I often use the ' operator for printing more compactly arrays (when used as simple containers), for example when I want to see the content of few vectors at the same time on my screen (and invariably get frustrated when it fails because elements can't be transposed). So a short syntax for the REPL is definitely handy. (Also, this makes it easier for people used to row-major arrays, to have a simple way to "switch the order", in particular when porting algorithms to julia using 2d arrays; but definitely not a strong argument either). Just to say that it's a nice terse syntax which is not useful only to linear algebraist.

Ismael-VC commented 6 years ago

I had commented some syntax ideas at https://github.com/JuliaLang/julia/pull/19344#issuecomment-261621763, basically it was:

julia> const ᵀ, ᴴ = transpose, ctranspose;

julia> for op in (ᵀ, ᴴ)
           @eval Base.:*(x::AbstractArray{T}, f::typeof($op)) where {T<:Number} = f(x)
       end

julia> A = rand(2, 2)
2×2 Array{Float64,2}:
 0.919332  0.651938
 0.387085  0.16784

julia>  Aᵀ = (A)ᵀ    # variable definition and function application are both available!
2×2 Array{Float64,2}:
 0.919332  0.387085
 0.651938  0.16784

julia> Aᴴ = (A)ᴴ
2×2 Array{Float64,2}:
 0.919332  0.387085
 0.651938  0.16784

But without the hack of course, just the idea that there can be "postfix function application" of sorts and that it demands parenthesis (x)f, dotted versions could be like this (x).f (xf would be an identifier, even with f being a superscript symbol).

This example hack used to work on 0.6 but now:

julia> Aᵀ = (A)ᵀ               
ERROR: syntax: invalid operator

julia> Aᵀ = (A)transpose       
2×2 Array{Float64,2}:          
 0.995848  0.549117            
 0.69401   0.908227            

julia> Aᴴ = (A)ᴴ               
ERROR: syntax: invalid operator

julia> Aᴴ = (A)ctranspose      # or adjoint or whatever
2×2 Array{Float64,2}:          
 0.995848  0.549117            
 0.69401   0.908227            

Which is sad, I originally wanted to do that for powers:

julia> square(n) = n^2; cube(n) = n^3;

julia> Base.:*(n, f::typeof(square)) = f(n)

julia> Base.:*(n, f::typeof(cube)) = f(n)

julia> const ² = square    # why?
syntax: invalid character "²"

julia> const ³ = cube    # why?
syntax: invalid character "³"

Which I naively thought would enable syntax like: n² = (n)² and n³ = (n)³ But any kinda numeric identifier is banned from being at first position, however (A)⁻¹ also worked, where ⁻¹ was const ⁻¹ = inv.

I have implemented a similar hack for InfixFunctions.jl.

As a user I could just do a PostfixFunctions.jl package, and be happy with whatever you find the best here. But currently this syntax restrictions:

Seem a little bit too much to me IMHO, I'd like to at least be able to define identifiers that can start with numeric superscripts, or more generally, only disalow actual numeric characters 0-9 with number semantics, at the start of an identifier, that would be awesome. 😄

Cheers!

JeffBezanson commented 6 years ago

See #10762 for some discussion of other number characters as identifiers.

The other issue is related to #22089, operator suffixes. +ᵀ is now a valid operator, which (probably accidentally) disallowed identifiers consisting only of combining characters in contexts where an operator might be expected. That seems like a bug to me. It's also a bit odd that is a valid identifier but -ᵀ does not do -(ᵀ). However that's not the end of the world, and IMO fixing it would not be worth losing other possible uses of .

StefanKarpinski commented 6 years ago

Note that using .' as a postfix transpose operator is not even on the table here (despite what the subject of the issue says), the consideration is actually whether we should keep .' as a postfix operator for non-conjugate adjoint, which would be recursive. This happens to often be the same as transposition, but is not generally the same operation. If linear algebra folks are willing to let .' mean generic array transpose, that's a different story, but my impression is that's not acceptable.

StefanKarpinski commented 6 years ago

@Ismael-VC, I can see allowing (x)ᵀ as a postfix function syntax for superscripts – since what else would it mean? I think where your proposal starts to rub people the wrong way is allowing any identifier to be applied as a function in the postfix syntax. I would limit it to superscripts.

stevengj commented 6 years ago

@StefanKarpinski, I thought that the consensus was precisely to allow .' mean non-recursive, non-conjugate array transposition (if we have this operator at all), while ' is the recursive, conjugate adjoint operation.

I really, really hate the idea of using for a postfix transpose operator. It is way too useful to have as a superscript in variable names, like aᵀa or LᵀDL = ltdlfact(A). (Besides the fact that using only for an operator while other superscripts are valid in identifies would be weird.)

StefanKarpinski commented 6 years ago

That was not my understanding at all – I thought that linalg people were in favor of keeping a.' as is, i.e. meaning conj(a)'. Keeping .' but changing its meaning to array transpose is quite different – I'm not sure how I feel about that. I agree that having only as a postfix operator would be annoying and inconsistent. I rather like @Ismael-VC's (a)ᵀ proposal, however, which wouldn't prevent using aᵀ as a name.

mbauman commented 6 years ago

My memory of those discussions mirrors Steven's. The recursive, non-conjugated transpose is rare and generally pretty strange. Decent summary here: https://github.com/JuliaLang/julia/issues/20978#issuecomment-316141984.

I think we all agree that postfix ' is adjoint and should stay. I think we all agree that postfix .' is suboptimal syntax. I think most agree that non-recursive (structural) transpose is more useful than a recursive transpose.

StefanKarpinski commented 6 years ago

Ok, so the points everyone seems to agree on:

  1. Use a' for adjoint(a)
  2. Use conj(a)' or conj(a') for the (non-)conjugate adjoint.

So the only point of contention is how to write the array transpose:

Is this assessment correct?

ttparker commented 6 years ago

Yes, I think so (where the "array transpose" is non-recursive).

Also, as I understand it, everyone agrees that transpose(a) should definitely be valid syntax (and non-recursive), and the only points of disagreement are whether .' and/or (a)ᵀ should be alternate (completely equivalent) valid syntaxes.

Sacha0 commented 6 years ago

Approach (1) from https://github.com/JuliaLang/julia/issues/20978#issuecomment-315902532, which received a good bit of support (e.g. https://github.com/JuliaLang/julia/issues/20978#issuecomment-316080448), remains a possibility. I have a branch realizing that approach (introducing flip(A)) which I can post.

For what it's worth, I support deprecating .'. The confusion and ambiguity in this thread is a strong argument for doing so in itself. Best!