JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.4k stars 5.46k forks source link

LinearAlgebra and TensorCore future design: Be more greedy #35763

Open EricForgy opened 4 years ago

EricForgy commented 4 years ago

TL ; DR

I'm proposing a change to the internal structure of LinearAlgebra that is more suitable for a to-be-developed TensorAlgebra library.

What it does:

What it does not do:

For easily 95% of use cases, I don't think there will be any impact at all. Most of the machinery can be hidden behind the scenes, but it paves the way to a clean tensor implementation that makes the effort worthwhile.

Summary

This is a follow-up from a discussion in PR #35150.

In particular (from Jeff):

It would be great to have an issue (is there one already?) listing all the dodgy stuff in LinearAlgebra that we want to deprecate or revisit.

Any concerns I have don't rise to the level of "dodgy" by any means since I think we already do things better than any alternative I have seen, but I do think we can do some things better.

If I were to try to summarize Jiahao's already nice summary, I'd say the conclusion was that scalars types, vector types and matrix types are not sufficient for linear algebra. You need a fourth fundamental type, i.e. covectors, which at the time, they called RowVector (but at some point along the way, RowVector was removed from LinearAlgebra for some reason).

The fact that "LinearAlgebra done right" requires dual vectors is no surprise since a linear map (matrix) is already a once-contravariant (vector) and once-covariant (covector) tensor, i.e. a machine that takes vectors and spits out vectors in a linear manner.

Although we lost RowVector (which I would call Covector if we were to introduce it again), given a vector u, we still have Transpose(u) and Adjoint(u). These are both, in fact, covectors so this is already better than any alternatives I have seen. However, not all covectors arise from taking the transpose or adjoint of a vector. For instance, the simplest nontrivial example of a pure covector I can think of is the differential:

df = (@_x f) dx + (@_y f) dy + (@_z f) dz.

[Ascii math note: @_x denotes "partial derivative with respect to x"]

[Aside: Given the importance of machine learning and automatic differentiation, I think getting this stuff right has some real practical benefits, so I hope what I'm saying isn't seen as purely theoretical.]

From a differential topological point of view, the differential df is more fundamental than grad(f), since grad(f) is defined as

grad(f) := (df)'

which has a dependency on the metric, where df does not. In fact, when you see things like

[f,@_x f,@_y f, @_z f]

in automatic differentiation, one could argue (correctly) that you are actually dealing with covectors. Calling these "gradients" is a misnomer that I'm not sure is healthy to propagate. You don't "pullback vectors". You "pullback covectors".

In the spirit of greediness, I want Julia to be even more greedy about getting this right where no one else I am aware (except maybe Haskell, which I am not familiar with) gets right. The difference between vectors and covectors matters although you can probably get away without worrying about it for 90% of use cases, but I don't think we are shooting for 90% excellence. We want to be 100% excellent and we can be.

LinearAlgebra

As I see it now, LinearAlgebra has 3 fundamental types:

with corresponding derived types:

Given a vector u, Transpose(u) and Adjoint(u) are covectors, but rightly have their own type distinct from Covector because of the way the data is laid out and how they interact with other operations (via dispatch).

To do LinearAlgebra right, I think we need (at least) 3 fundamental types:

with corresponding derived types:

[Note: If we have vectors and covectors as fundamental types, a matrix is actually a derived type.]

If we were super greedy, we'd also include:

For example, if u and v are vectors, the vector cross product u × v is actually a pseudovector.

I have put some thought into how to do this, but it is not easy to get right (otherwise everyone would have already done it). For example, one of my earlier ideas was to extend Array to negative dimensions so that:

const Covector{T} = Array{T,-1}

but that seems unsatisfactory and not great for students just learning linear algebra.

Then, there is a beautiful package: TensorKit.jl. The documentation goes into some awesome background material. It is probably my own failing, but some things there are little hard for me to understand, but what I do understand looks great and forms a nice basis for thinking through this stuff.

I think that getting LinearAlgebra right requires getting tensors right. While we sort all this out, Tim created TensorCore.jl

TensorCore

After the discussion in #35150, a merged PR that introduced tensor product to LinearAlgebra was reverted and moved to TensorCore.jl. That seems like a good place to experiment and sort this out.

Both vectors and covectors are special kinds of tensors, so it seems sensible to try to work this out for tensors and then make Vector and Covector special instances of a new Tensor type rather than Array.

I sketched some initial (only lightly baked) ideas in a comment on the PR. The idea is that given vector spaces U, V, W and their respective dual spaces U*, V*, W*, we can define a vector u ϵ U as an instance of Tensor{T,U}, i.e.

typeof(u) == Tensor{T <: Number, U <: VectorSpace}

Similarly, we can define a covector α ϵ U^* as an instance of Tensor{T,U*}.

Note: U* doesn't actually parse here, but I experimented with something like

struct U{T} end

so that we can use types U and U{*} for dispatch. When written out, it would look like Tensor{T,U{*}}, but I'll continue to write U* for now.

In this way, a matrix A would be an instance of type Tensor{T,V,U*}.

The tensor product of two vectors of type Tensor{T,U} and Tensor{T,V} is a tensor of type Tensor{T,U,V}, i.e. the list of vector space type parameters is concatenated.

Given two vectors u ϵ U and v ϵ V together with covectors α ϵ U* and β ϵ V*, tensor product results in four distinct geometric objects:

with additional obvious permutations. This extends to higher order, e.g.

Multilinear Machines

Tensors can be thought of multilinear machines taking vectors and covectors as arguments and producing a number that is linear in each argument, e.g. if A is an instance of Tensor{T,U,V*,U*,V} with u, v, α and β as above, then

A(α,v,u,β)

is an instance of T.

Partial Evaluation

With the same A as above, we can consider partial evaluation:

A(-,v,u,β).

This an an instance of type Tensor{T,U}, i.e. a vector.

Similarly,

A(α,-,u,β)

is an instance of type Tensor{T,V*}, i.e. a covector.

Hence, a vector can be thought of as a machine that takes a covector and produces a number in a linear manner. Similarly, a covector can be thought of as a machine that takes a vector and produces a number in a linear manner.

Now, we can revisit matrices as linear maps by considering

A(-,-,u,β)

which is an instance of Tensor{T,U,V*}, which we've said is a matrix. If we feed A one more vector like

A(-,v,u,β).

we get a machine that will take one more covector and produce a number, but we already said this type of machine is a vector, so A(-,-,u,β) is a linear map V -> U.

Tensor Operations

I think there is one area we did not get right with LinearAlgebra and I suspect it is the source of issues elsewhere. Given vectors u ϵ U and v ϵ V, LinearAlgebra currently defines

u*v'

to be a matrix. This is unfortunate. The correct expression should be

u ⊗ v'

is a matrix. So we have effectively made * = ⊗. This is unfortunate because * and are two distinct tensor operators with different meanings so identifying them is problematic.

Since the proposed tensor type Tensor consists of a scalar type T and an ordered list of vector spaces and dual vector spaces, I'd probably introduce three methods:

Then, we can define A1*A2 for tensors A1 and A2 only if

aredual(lastspace(A1),firstspace(A2)

i.e. multiplication of tensors is only defined if the last space of the first tensor is dual to the first space of the second tensor. This can be thought of as path concatenation. Two paths can only be concatenated if the end of the first path coincides with the beginning of the second path.

If that is the case, then we simply contract the last tensor index of the first tensor with the the first tensor index of the second tensor.

For instance, given a matrix A of type Tensor{T,U,V*} and a vector u of type Tensor{T,V} then we have

*:  Tensor{T,U,V*} x Tensor{T,V} -> Tensor{T,U}.

This is clearly distinct from:

⊗: Tensor{T,U,V*} x Tensor{T,V} -> Tensor{T,U,V*,V}

As a more general example, consider

*: Tensor{T,U,V*,W} x Tensor{T,W*,V,U} -> Tensor{T,U,V*,V,U}

and

⊗: Tensor{T,U,V*,W} x Tensor{T,W*,V,U} -> Tensor{T,U,V*,W,W*,V,U}

Conclusion

It may seem like overkill to bring in such discussion of tensor algebra just in order to get LinearAlgebra right, but I am greedy and I would like LinearAlgebra to be consistent with a more general TensorAlgebra (or TensorCore). I think it is worth trying to get tensors right and then revise LinearAlgebra accordingly for consistency.

I also think the scope of LinearAlgebra should be confined to vectors, covectors, matrices (as linear maps) and possibly other rank-2 tensors since

Operations that keep you in scope, e.g. dot, are fine, but things like kron that take you out of scope should probably be in another package like TensorCore (or some new TensorAlgebra).

I see this as a long term project. Maybe for Julia v2.0 or v3.0. It would be awesome if something like this could make it to v2.0, but it is a lot of work. I can try to make a POC package and see if it makes sense to others. In any case, after the discussion in #35150 , I felt like writing a comprehensive summary might be of interest.

mcabbott commented 4 years ago

You might be interested in ITensors.jl, whose tensor objects are explicitly labelled by the spaces (as you suggest) although perhaps without distinguishing dual spaces. While you write that * should contract only neighbouring indices, I believe they contract all matching, regardless of order, more like this

*: Tensor{T,U,W,V*} x Tensor{T,W*,U,V} -> Tensor{T,U,U}

with a notion of primes to label indices in the same space which ought not to be contracted.

About present behaviour, you write:

LinearAlgebra currently defines u * v' to be a matrix. This is unfortunate. The correct expression should be u ⊗ v' is a matrix. So we have effectively made * = ⊗.

I'm not sure it's quite as dire as you make out. There is an implicit second dimension size(u,2)==1, and an explicit first one size(v',1)==1, which are being contracted here. This notation isn't favoured by people who wish to keep tensors like straight, but e.g. every IEEE journal seems to use this (although not u * v'). It does pack a lot of common use cases into two ascii symbols * and '.

EricForgy commented 4 years ago

Thanks for your comment mcabbott 😊

While you write that * should contract only neighbouring indices, I believe they contract all matching, regardless of order

This would be a good thing to discuss. For any application I have ever seen, it would not make physical sense to have * contract all possible indices like that. I'm not even sure such an operation would be associative (I'm pretty sure it wouldn't be). However * is defined, it should be associative.

In any case, the point is that * and are two different operations and identifying them is problematic (not dire by any means - at most, suboptimal, i.e. not greedy enough.).

every IEEE journal seems to use this

Not every IEEE journal takes transposes seriously 😊 When Julia decided to do this right, we made a conscious decision to NOT think of vectors as single column matrices. A single column matrix still provides a representation of a vector though. If a vector was a column matrix and if a covector was a row matrix, then the notation u*v' would be fine. However, since we chose not to do that, it is not fine and allowing that is a kind of step backward (in hindsight because I gave a thumbs up for this in #4774, which I see now was a mistake).

The correct operation is u ⊗ v' and by writing this as u*v', we are identifying * = ⊗, which I didn't see as problematic back then, but I see it now when you consider what the correct tensor algebra should be.

EricForgy commented 4 years ago

Note on assocativity of a * that contracts all possible indices:

[(u ⊗ v') * u'] * (w ⊗ u) = ||u||^2 v' ⊗ w ⊗ u
(u ⊗ v') * [u' * (w ⊗ u)] = ||u||^2 u ⊗ v' ⊗ w

Therefore, such a * would not be associative.

mcabbott commented 4 years ago

Edit: iTensor does write this as *, but everything there is independent of the order of indices, so the two things you write are equivalent.

we made a conscious decision to NOT think of vectors as single column matrices

This is not quite correct. They have ndims(u) != 2, I know, so we are not forced to always think of them this way. But within LinearAlgebra, in operations also involving Adjoint objects, the map to the representation of these by matrices is quietly inserted. In other words, we have decided to think of them as columns, in certain contexts. We could decide otherwise in the future, and this would come with different trade-offs. (Weren't you greatly opposed to unicode a few weeks ago?)

Anyway one more thought. For hacking together such Tensor objects, I had a PR to NamedDims at some point which used names (:♯μ, :♭ν, :σ) to mean upstairs / downstairs / indifferent, and then made * check for a valid contraction. That wasn't zero-cost, because while these names are in the type, taking them apart to check for ♯/♭ (sharp/flat chars, BTW) took time.

StefanKarpinski commented 4 years ago

@EricForgy, can I assume you've read #4774? The concept of up and down dimensions in arrays was considered there and intentionally not chosen because it was too complex for such a basic type. The notion of explaining to a new users why they cannot multiply A*v even though A is a two dimensional array because the second dimension of A has the "wrong variance" is not super appealing. I can absolutely see that a coherent framework based on that could be worked out (and spent a considerable amount of time in the lead up to 1.0 thinking about ways to make that user-friendly enough to ship in LinearAlgebra), but it seems too nuanced for the standard library.

EricForgy commented 4 years ago

Hi Stefan,

Yeah. I read and even participated in #4774 😊

I agree that up / down indices would be too complicated, but I never suggested doing that. The underlying representation can still be an Array (or SArray) with the usual indices. You don't need iup / down indices to keep track of contravariant vs covariant.

The 4 rank-2 tensors

are all different types, but all have representations as a 2D array of numbers and are indexed like a normal Array{T,2}.

It is good to point that out though, so I can clarify 👍

bionicles commented 4 years ago

if tensors were machines then they would be callable. they are data structures

EricForgy commented 4 years ago

Hi bionicles 👋

if tensors were machines then they would be callable. they are data structures

I didn't see this comment, but I would just note that tensors (in the context of multilinear algebra) are absolutely machines and, as such, should be callable. Making that connection is important (and would probably a good PR to LinearAlgebra) 👍

As an example, given vectors A and B, we have

A · B = g(A,B) = g(A)(B) = A*(B)

where g is the metric tensor and A* = g(A) is a covector obtained from partial evaluation. This is what people refer to as "raising indices" and is one of the major points of this proposal 😊

Note: As I noted in the original post, tensors "done right" should handle both covectors and vectors as first-class citizens, but that doesn't mean we need to change the indexing. Both vectors and covectors can be indexed with [ ] as usual as long as we keep track of what space they're in.

pablosanjose commented 4 years ago

I agree that up / down indices would be too complicated, but I never suggested doing that. The underlying representation can still be an Array (or SArray) with the usual indices. You don't need iup / down indices to keep track of contravariant vs covariant.

I was delighted to find this issue and would love to have a correct and powerful tensor representation in Julia. However I agree with Stefan that this should probably be done in a two-layer approach, with only the very plain and easy to understand Array{T,N} layer exposed to the user by the standard library. Ideally the tensor functionality should be a superstructure built on top (i.e. making your fundamental types wrappers of "plain" AbstractArray containers). Is this what you are hinting at in the quote above? In that case, what changes in Base would be absolutely required to make this superstructure possible in a package (apart from a possible u * v' deprecation)?

EricForgy commented 4 years ago

Hi @pablosanjose 👋

However I agree with Stefan that this should probably be done in a two-layer approach, with only the very plain and easy to understand Array{T,N} layer exposed to the user by the standard library.

I'm not sure if this could be done properly with just Array{T,N}. That is not expressive enough to capture mixed tensors with both contravariant and covariant components. For example, we would need to be able to distinguish

A = A^i_j,k e_i ⊗ dx^j ⊗ dx^k

from

B = B_i,j^k dx^i ⊗ dx^j ⊗ e_k

although the underlying datastructure and indexing for both is still Array{T,3}.

We need typeof(A) !== typeof(B) since they are different types of tensors, but with the same underlying datastructure.

It was brought to my attention just today that we have some maturing capabilities for applied category theory now in Julia with Catlab.jl. That is awesome 😍

As my friends at the nLab so eloquently put it.

The study of Vect is called linear algebra.

Now I am thinking that the right way to go about this might be to have LinearAlgebra implement the category of free vector spaces.

See, for example,

https://algebraicjulia.github.io/Catlab.jl/latest/apis/core/#Instances.

🤔

pablosanjose commented 4 years ago

That is not expressive enough

I probably did not express it clearly. Of course we would need a new Tensor type that contains all co(ntra)variant info (either as parameters or a field), but that needs not be something all users have to use and understand, only people really wanting to do tensor stuff. It is only the data of the tensor that would be stored in an AbstractArray. My actual question is: can/should we simply build Tensor on top of LinearAlgebra (i.e. as a package), or is it better to rebuild LinearAlgebra around new fundamental tensor-oriented types? I'm tentatively inclined towards the former, simply because I fear that many new users entering Julia might have difficulties grappling with unfamiliar math concepts (i.e. beyond the naive view of matrices and vectors as "lists/grids of numbers"). In any case, this fear might well be unfounded, and I would love to be persuaded by further discussion here.

EricForgy commented 4 years ago

Oh right. Sorry. I did misunderstand. Yes. Absolutely.

My actual question is: can/should we simply build Tensor on top of LinearAlgebra (probably as a package), or is it better to rebuild LinearAlgebra around new fundamental tensor-oriented types?

My initial thought was to do the latter, but the former could also work and would probably be more palletable.

As I said in the first post:

To do LinearAlgebra right, I think we need (at least) 3 fundamental types:

  • Scalars
  • Vectors
  • Covectors

with corresponding derived types:

  • Matrices
  • Transpose
  • Adjoint

[Note: If we have vectors and covectors as fundamental types, a matrix is actually a derived type.]

If we were super greedy, we'd also include:

  • PseudoVectors
  • PseudoCovectors

The above could be implemented in LinearAlgebra without needing a more general TensorAlgebra so, I think you're right, that is probably the better way to go 👍

If that is done "right" in LinearAlgebra with proper definition of , then TensorAlgebra could build on that.

A cool side benefit of bringing category theory in is that it would give us an excuse to use string diagrams to illustrate a lot of the concepts, e.g. a vector is an arrow pointing down, a covector is an arrow point up, tensor product is two arrows next to each other, etc 😊