JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.54k stars 5.47k forks source link

Taking vector transposes seriously #4774

Closed jiahao closed 7 years ago

jiahao commented 10 years ago

from @alanedelman:

We really should think carefully about how the transpose of a vector should dispatch the various A_*op*_B* methods. It must be possible to avoid new types and ugly mathematics. For example, vector'vector yielding a vector (#2472, #2936), vector' yielding a matrix, and vector'' yielding a matrix (#2686) are all bad mathematics.

What works for me mathematically (which avoids introducing a new type) is that for a 1-dimensional Vector v:

A general N-dimensional transpose reverses the order of indices. A vector, having one index, should be invariant under transposition.

In practice v' is rarely used in isolation, and is usually encountered in matrix-vector products and matrix-matrix products. A common example would be to construct bilinear forms v'A*w and quadratic forms v'A*v which are used in conjugate gradients, Rayleigh quotients, etc.

The only reason to introduce a new Transpose{Vector} type would be to represent the difference between contravariant and covariant vectors, and I don't find this compelling enough.

madeleineudell commented 10 years ago

i understand the difficulties with x_y' and y'_x. It may be better to treat inner and outer products as separate operations, using eg dot(). (Perhaps also using \cdot?)

But what are the arguments in favor of having a slice along the first dimension return an object whose dimension is different than a slice along the second dimension? For consistency, it seems that every time you slice, the dimension of the resulting object should be diminished by one.

On Wed, Jul 16, 2014 at 8:17 PM, Stefan Karpinski notifications@github.com wrote:

It's not a coherent proposal unless you make '* a special operator, which is pretty dubious, since then x'_y and (x')_y do not mean the same thing. Moreover, it would make multiplication non-associative.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4774#issuecomment-49254346.

Madeleine Udell PhD Candidate in Computational and Mathematical Engineering Stanford University www.stanford.edu/~udell

Jutho commented 10 years ago

@madeleineudell , I agree with you, but that's a different issue, see #5949 . Although that issue seems to be closed, I don't remember there was a clear agreement or conclusion.

timholy commented 10 years ago

Once we switch over to array views, it will become easier to explore those directions. In particular, saying slice(A, i, :) gets you the behavior you want. (It does so right now, but at the cost of introducing a slower type, the SubArray.)

jdbates commented 10 years ago

From a purely mathematical standpoint, all of the issues presented here come from a conflation of (and confusion between) what we mean by arrays and what we mean by vectors/tensors/matrices. Arrays, conceptually, are simply lists (or, in the case of n-dim arrays, lists of lists). As such, there is no natural specification for operations like array multiplication, transposition, etc. While functions like permutedims, element-wise operations, and axis-specific operations (mean, median, etc.) make sense and can be uniquely defined in a natural way, operations such as dot products cannot be.

As mentioned above, vectors and tensors are geometric objects, and while it's possible to represent them using arrays these representations do not contain the same richness of structure as the mathematical objects they represent. The transpose of a 1-dim array is a no-op; the transpose of a vector is its dual. The transpose of a 2-dim array can be uniquely and naturally defined as the permutation of its dimensions, but this is not generally true for tensors: while the natural case holds for rank (1,1) tensors (aka, matrices), a rank (2,0) tensor transposes into a rank (0,2) tensor. Again, by treating tensors as arrays, the geometric information that makes tensors tensors is lost.

This matters when defining operations such as dot products. A dot product has a specific geometric meaning (the projection of one vector onto the dual space defined by a second vector), and thus a consistent definition of dot products requires preservation of the geometric information contained in vectors. Using certain assumptions might make it possible to use arrays and still cover a majority of use cases, but these assumptions are messy (as seen by the various proposals in this thread) and actually makes things more difficult for anyone who needs the richer structure of tensors.

So, consider this a strong vote in favor of thomasmcoffee's suggestion for including a richer AbstractTensor type. My personal preference would be that operations such as transposition and dot products were not even defined for arrays, but as I suspect most people would not share that view, I would at least want the ability to create true tensors should the need arise.

JeffBezanson commented 10 years ago

The practical implication of this perspective seems to be that arrays should be identified with a subset of tensors, and transposing a 1-d array should give a DualVector or perhaps an error. My view is that this is analogous to operations on real numbers that give complex numbers.

Jutho commented 10 years ago

My perspective would be that the general AbstractArray family, a (multidimensional) data container, is a sufficiently general to be an indespensible part of any technical programming language. A tensor following strict mathematical rules, even though I care for it dearly, is a good object for a dedicated package. In fact, I am working on something along the lines specified by @jdbates in https://github.com/Jutho/TensorToolbox.jl . It is so far undocumented and largely untested. I wrote it for the things I need personally in quantum many body physics, but I hope it is constructed in a way that is sufficiently general and extensible to be useful for the greater community of mathematicians and physicists who care about working with tensors.

To give some detail (copied from the JuliaQuantum forum): I decided to define a new type hierarchy for tensors, that is independent of the AbstractArray type of Julia (although the basic Tensor is just a wrapper for Array). This type hierarchy is supposed to work at a slightly more formal way. Tensor indices are associated to vector spaces (henceforth referred to as index spaces), and if the type of vector space to which the tensor index is associated is different from its dual, this corresponds to a tensor which distinguishes between covariant and contravariant indices.

So the first part of the package is the abstract part for defining vector spaces, where I go match the type hierarchy of Julia objects to the mathematical hierarchy of vector spaces. A general vector space V comes in four varieties, corresponding to the representation theory of the general linear group on V, i.e. V itself (fundamental representation), conj(V), dual(V) and dual(conj(V)). For real vector spaces, conj(V)=V and there is only V and dual(V), corresponding to contravariant and covariant vectors. Then there are the inner product spaces, and at the top level of the hierarchy are the Euclidean spaces , which are inner product spaces with a standard Euclidean inner product (i.e. orthogonal basis). In physics, it is also useful to think about vector spaces which are decomposed into different sectors, i.e. they are graded by e.g. irreducible representations of symmetry actions.

Tensors are objects living in (a subspace of) the tensor product of some elementary vector spaces. However, aside from the standard Tensor, which is an object living in the tensor product space of its index spaces, one could build tensors which live in e.g. the invariant sector of a tensor product of spaces graded by irreps, the symmetric or antisymmetric subspace of a tensor product of identical spaces, ... One could have fermionic vector spaces as index spaces, which implies that a permutation of the tensor indices will induce certain sign changes depending on the parity sectors, etc...

Then there are supposed to be certain operations defined on tensors, the most important of which is contracting tensors, but also, e.g. orthogonal factorisations (singular value decomposition) etc. Finally there should be linear maps that map one tensor onto another one. They deserve a special type in that one typically don't want to fully encode them as matrix, but rather in a way that the matrix vector product can be computed efficiently, for use in iterative methods (Lanczos etc). My two existing packages so far (TensorOperations.jl and LinearMaps.jl) implement this functionality for standard Arrays, the tensor toolbox under construction would overload / redefine them for the new AbstractTensor hierarchy.

I hope that this package is sufficiently general so that it is also useful for the wider physics/mathematics community. E.g. if somebody comes along that creates a package for working with manifolds, he could then define a TangentSpace vector space as subspace of the abstract InnerProductSpace, and he can then immediately create tensors living in the tensor product of a few tangent and cotangent spaces. In fact, I am thinking of splitting the part for defining vector spaces into a separate package, that could grow into a package for defining mathematical structures/objects.

Finally, the interop with standard julia comes from calling tensor on a standard Array, which wraps it into an object of type Tensor with the indices associated to spaces of type CartesianSpace. This is the standard real vector space R^n with Euclidean product, where there is no distinction between covariant and contravariant index. I think this entails best what a standard Julia Array is.

jdbates commented 10 years ago

@JeffBezanson, I'm ambivalent regarding treating arrays as subsets of tensors. No information is lost that way, but at the same time there are multiple possible interpretations for arrays, and the tensor interpretation doesn't always (or even usually) make sense. Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).

I'm not opposed to @Jutho's suggestion of splitting off the tensor mechanics into a separate package. He's probably right that the number of users who need the full tensor mechanics is much smaller than the number of people who just want straightforward array operations. The question we are really trying to ask here is "in what domain should linear algebra fall?"

The machinery of linear algebra is a substantive enough subset of the machinery of tensor algebra that, in my mind at least, it makes no sense to implement the former without also implementing the latter. Operations like v'M are more concisely and consistently represented if we have a notion of co- and contravariant vectors, but that already puts us most of the way towards general tensor operations.

I agree with you that this is conceptually similar to operations on real numbers which return complex numbers.

kmsquire commented 10 years ago

Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).

While this doesn't address or take away from your overall message, https://github.com/timholy/Images.jl/pull/135 is working toward an implementation of this idea for images. I'm hoping this also makes it easy to deal with color structure tensors, which I'm looking to use for a project.

Jutho commented 10 years ago

On 23 Aug 2014, at 20:36, jdbates notifications@github.com wrote:

@JeffBezanson, I'm ambivalent regarding treating arrays as subsets of tensors. No information is lost that way, but at the same time there are multiple possible interpretations for images, and the tensor interpretation doesn't always (or even usually) make sense. Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).

I agree that tensors do not supersede arrays. This example above is indeed a different mathematical structure (i.e. a vector bundle or more generally a tensor bundle) whose representation also can be given as a multidimensional array by choosing a grid for the manifold coordinates and a basis for the vector space part. So indeed, you can have different mathematical objects/structures which are well defined in a coordinate-independent / basis-independent way but which can be represented (after choosing a coordinate system or a basis) as a multidimensional array. So multidimensional arrays are certainly not restricted to representing tensors. The other way around also fails, as not all tensors have a convenient representation using a multidimensional array. That is only the case when you use a particular basis known as the product basis, which is obtained by taking the direct product of all possible combinations of the individual basis vectors of the vector spaces involved in the tensor product space. In some cases, e.g. when using tensors in a symmetry-invariant subspace of the tensor product space, such a representation is no longer possible and you need to define a different basis for the complete space, with respect to which the tensor is just represented as a long one-dimensional list of numbers.

I'm not opposed to @Jutho's suggestion of splitting off the tensor mechanics into a separate package. He's probably right that the number of users who need the full tensor mechanics is much smaller than the number of people who just want straightforward array operations. The question we are really trying to ask here is "in what domain should linear algebra fall?"

The machinery of linear algebra is a substantive enough subset of the machinery of tensor algebra that, in my mind at least, it makes no sense to implement the former without also implementing the latter. Operations like v'M are more concisely and consistently represented if we have a notion of co- and contravariant vectors, but that already puts us most of the way towards general tensor operations.

I agree with you that this is conceptually similar to operations on real numbers which return complex numbers.

— Reply to this email directly or view it on GitHub.

thomasmcoffee commented 10 years ago

there are multiple possible interpretations for arrays, and the tensor interpretation doesn't always (or even usually) make sense. Consider images: an image can be thought of as a vector-valued field on a (typically 2d) manifold. Restricting that field to a rectangular grid gives you a structure that, naturally, you would want to represent using a 3d array. However, really, this is just a mapping from the space of grid points into the {R,G,B} vector space, so the geometric meaning of the first two dimensions (the x and y labels of the grid) is different from the geometric meaning of the third dimension (which is, in fact, a vector).

It was just this sort of distinction I was attempting to capture in the notional AbstractTensorArray proposal of https://github.com/JuliaLang/julia/issues/4774#issuecomment-38333295 by allowing both array-like and tensor-like dimensions. Under this scheme, I would expect to represent your example as

AbstractTensorArray{Uint8, 3, [false, true, false], [true, false, false]}

so that the x, y, and RGB dimensions are "down", "up", and "neutral", respectively. Geometric operations (e.g., affine transformations) could then handle the grid coordinate dimensions in tensor-like fashion while mapping over the RGB values in array-like fashion. (If you later want to treat the RGB values geometrically, you'd have to explicitly change the mask for that purpose, but I would guess that (a) it's less common that two different flavors of geometric operations will be applied to different subspaces of the same data table, and (b) in this situation, an explicit conversion would probably improve the clarity of code.)

I hadn't considered the conjugate representations that @Jutho mentions, but it seems to me that this generalization could be handled by further extending the same masking approach, for complex spaces.

The question we are really trying to ask here is "in what domain should linear algebra fall?"

Once a design is settled for how array-like and tensor-like operations play together, the entities for linear algebra can just be defined by special cases (like the aliases I used above), so that the pure linear algebra user can be oblivious to the whole generalized tensor hierarchy until it's needed (but won't have to rewrite things if and when it is). So I would see no issue (except perhaps bloat) putting the whole thing in Base.

Jutho commented 10 years ago

so that the x, y, and RGB dimensions are "down", "up", and "neutral", respectively. Geometric operations (e.g., affine transformations) could then handle the grid coordinate dimensions in tensor-like fashion while mapping over the RGB values in array-like fashion. (If you later want to treat the RGB values geometrically, you'd have to explicitly change the mask for that purpose, but I would guess that (a) it's less common that two different flavors of geometric operations will be applied to different subspaces of the same data table, and (b) in this situation, an explicit conversion would probably improve the clarity of code.)

I think you’re mixing something here. In the discussion above, it was actually explained that the x and y coordinates did not carry the vector space interpretation, as they can correspond to coordinates on an arbitrary curved manifold, not necessarily a flat space. It was the RGB dimension that was given the vector interpretation, although this might actually also not be the best choice, as I seem to remember (I don’t have a decent background in image processing) that color space is also rather curved. Also, even for the case where the domain (x and y) forms a vector space, why would x and y be up and down, or was this just as an example of your notation?

Anyway, I also started with TensorToolbox.jl by denoting covariant and contravariant indices as some kind of parameters or mask, but this soon become a complete nightmare, which is why I switched to a representation where every tensor is an element to of some vector space, and to perform operations, one has to check that spaces match, just like you need to check that sizes match when doing operations with arrays.

thomasmcoffee commented 10 years ago

x and y coordinates did not carry the vector space interpretation

Sorry, I overread "rectangular grid" --- I guess @jdbates meant precisely what he said. But aren't we just talking about replacing dot products with generalized inner products? (Forgive me if I misunderstand, I spend almost all my time in Euclidean space :-)

every tensor is an element to of some vector space

Seems like a nice idea --- I'd be interested to see some examples of how it works for the user (I didn't get very far reading the code).

StefanKarpinski commented 9 years ago

I've got a new proposal for this issue.


(1) APL-style slicing.

size(A[i_1, ..., i_n]) == tuple(size(i_1)..., ..., size(i_n)...)

In particular, this means that "singleton slices" – i.e. slices where the index is scalar or zero-dimensional – are always dropped and M[1,:] and M[:,1] are both vectors, rather than one being a vector while the other is a row matrix, or any other such distinction.


(2) Introduce Transpose and ConjTranspose wrapper types for vectors and matrices. In other words, something like this:

immutable Transpose{T,n,A<:AbstractArray} <: AbstractArray{T,n}
    array::A
end
Transpose{T,n}(a::AbstractArray{T,n}) = Transpose{T,n,typeof(a)}(a)

and all the appropriate methods to make these work as they should for vectors and matrices. We may want to limit it to only working for vectors and matrices, since it's unclear what a general transpose should mean for arbitrary dimensions (although just reversing dimensions is tempting). When you write a' you get ConjTranspose(a) and likewise v.' produces Transpose(a).


(3) Define various specialized methods for (conjugate) transposed vectors and matrices, such as:

*(v::Transpose{T,1}, w::AbstractVector) = dot(v.array,w)
*(v::AbstractVector, w::Transpose{T,1}) = [ v[i]*w[j] for i=1:length(v), j=1:length(w) ]

etc., including replacing all the horrible At_mul_B functions and special parsing with lazy (conjugate) transpose construction followed by dispatch on Transpose and ConjTranspose types.


(4) Restrict broadcasting operations to cases where the arguments are scalars or arrays with the same number of dimensions. Thus, the following, which currently works as shown will fail:

julia> M = rand(3,4);

julia> M./M[1,:]
3x4 Array{Float64,2}:
 1.0       1.0       1.0      1.0
 0.516884  0.675712  2.11216  9.0797
 1.00641   0.726229  2.48336  4.38751

julia> M./M[:,1]
3x4 Array{Float64,2}:
 1.0  0.891557  0.561464  0.103968
 1.0  1.16552   2.29433   1.82633
 1.0  0.643353  1.38544   0.453257

Instead, you will have to do something like this:

julia> M./M[[1],:]
3x4 Array{Float64,2}:
 1.0       1.0       1.0      1.0
 0.516884  0.675712  2.11216  9.0797
 1.00641   0.726229  2.48336  4.38751

julia> M./M[:,[1]]
3x4 Array{Float64,2}:
 1.0  0.891557  0.561464  0.103968
 1.0  1.16552   2.29433   1.82633
 1.0  0.643353  1.38544   0.453257

I believe this proposal solves all of the major issues we currently have:

  1. symmetric slicing behavior – trailing dimensions are no longer special.
  2. v'' === v.
  3. v' == v.
  4. v'w is the dot product of v and w – in particular, it's a scalar, not a one-element vector.
  5. v*w' is the outer product of v and w.
  6. M*v is a vector.
  7. M*v' is an error.
  8. v'*M is a transposed vector.
  9. v*M is an error.
  10. At_mul_B operators and special parsing go away.
andreasnoack commented 9 years ago

:+1: to all of it. I did some work on 2 and 3 in #6837, but never finished it. @simonbyrne also looked into it.

nalimilan commented 9 years ago

+1 too. Sounds like it would offer a quite consistent behavior all over the place.

StefanKarpinski commented 9 years ago

The only really disruptive part of this proposal would actually be that M[1,:] is a implicitly vertical vector rather than an explicitly horizontal row matrix. Otherwise, it's actually a pretty smooth, non-disruptive set of changes (one hopes). The main epiphany (for me) was that APL slicing behavior could be combined with lazy transposes. If we get buy-in, we can come up with a plan and split up the work. I really hope that lazy transposes and staged functions allow for some code reduction and simplification.

laurentsorber commented 9 years ago

Yes, please! Tensor transpose should probably allow any user-defined permutation, with reversing the dims as the default.

StefanKarpinski commented 9 years ago

Tensor transpose should probably allow any user-defined permutation, with reversing the dims as the default.

That seems like it would complicate the type a bit much, perhaps we can have a PermuteDims type that allows arbitrary lazy dimension permutation.

i2000s commented 9 years ago

@Stefan: This seems a pretty good idea to work out the vector and 2-dim algebra. Just a few challenges:

  1. Regarding multiple-dimension array cases: For an array A with dimension (i_1, i_2, ..., i_n), if one wants the transpose applied to the [i_2, i_3] dimensions--or, even hash, on the [i_2,i_4] dimensions. Can you do it in the new definition of transpose?
  2. Regarding singleton dimension: it is possible that a singleton slice is left intentionally. Should Julia keep this singleton dimension after the calculation? For example, if one defined a vector as an array V in the dimension of (2,1), and wants to multiply the transpose with a matrix A in dimension (2,3,4). Can you yield the result of v'*A in the dimension of (1,3,4)?

On Thu, Oct 16, 2014 at 2:31 PM, Stefan Karpinski notifications@github.com wrote:

The only disruptive would actually be that M[1,:] is a (vertical) vector rather than a row matrix. Otherwise, it's actually a pretty smooth, non-disruptive set of changes (one hopes). The main epiphany (for me) was that APL slicing behavior could be combined with lazy transposes. If we get buy-in, we can come up with a plan and split up the work. I really hope that lazy transposes and staged functions allow for some code reduction and simplification.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/4774#issuecomment-59425385.

simonbyrne commented 9 years ago

Re 2 & 3: having had a rough stab at it, I came to the conclusion that the vector transpose should NOT be a subtype of AbstractVector, otherwise everything gets way to messy (see discussion on #6837). I think the most sane way forward is with Transpose{T,A} <: AbstractMatrix{T}, and a separate Covector type (+ Conjugate variants).

The other significant problem I came across is that often you want to dispatch on a specific matrix type, its transpose, or its conjugate transpose. Unfortunately, I couldn't come up with a way to express this via existing type machinery (see this mailing list discussion). Without this, I fear we'll be in for a lot of @eval-ing over 3x3 possible combinations of arguments.

StefanKarpinski commented 9 years ago

@simonbyrne, I defer to your experience with the implementation. Does the rest of it seem reasonable?

timholy commented 9 years ago

I've pointed out (in less-public forums, so it probably bears a brief mention here) that a potential alternative is to handle all shaping internally, by expanding the types of indexes that SubArrays can use. In particular, one could have a "transposed range" type that would confer a transposed shape to the SubArray, even when the parent array is a Vector. (See https://github.com/JuliaLang/julia/blob/d4cab1dd127a6e13deae5652872365653a5f4010/base/subarray.jl#L5-L9 if you're unfamiliar with how SubArrays are/may be implemented.)

I am not certain whether this alternative strategy makes life easier or harder. It reduces the number of externally-facing types, which might mean that one needs fewer methods. (As someone who is still filling in missing methods due to the Color transition in Images, this seems like a Good Thing.) On the other hand, in the absence of convenient triangular dispatch it could make it somewhat more awkward to write selective methods, which might exacerbate the problems raised by @simonbyrne.

Any insights would be most welcome.

Aside from such details, I like the shape of @StefanKarpinski's proposal. I am not wedded to APL-style indexing, but on balance I suspect it is a better choice than the Matlab-derived rules we have now.

toivoh commented 9 years ago

Two thoughts:

StefanKarpinski commented 9 years ago

I was thinking of proposing that indexing with semicolons, a la A[2;:], could be a different indexing mode where the result always has the same number of dimensions as A – i.e. drop no singletons and indexing with anything having rank more than one is an error. Decided to leave that out of the core proposal for simplicity, but something like that does seem like a good thing to have.

Jutho commented 9 years ago

I can see the concerns expressed by @simonbyrne . However, in principle, a covector is also just a vector living in a different vector space, namely the dual space. So making the Transpose or Covector type not a subtype of AbstractArray also feels somewhat unpleasing. A possible resolution, which would be a major breaking change and is probably not going to be considered (but I wanted to mention it anyway) is to given the whole AbstractArray family an extra type parameter trans, which could have values :N, :T or :C. For all methods that just assume a vector to be one-dimensional list of numbers, they would not need to distinguish between different values of this final parameter, and so the corresponding method definitions can stay as they are now.

For N-dimensional arrays with N>2, there are various options. Either transpose gives an error and it is impossible to actually create an object of type AbstractArray{3,Float64,trans} where trans!=:N, or, alternatively, :T just means row-major and transpose of a general array has the effect of reversing all dimensions. I think the latter is also the accepted convention by those people that use the Penrose graphical notation (see http://en.wikipedia.org/wiki/Penrose_graphical_notation although transpose is not explained there, but also see the cited book by Cvitanović ).

I don't really see the role for arbitrary index permutations being supported by transpose, there is permutedims for that, and maybe some lazy approach using revamped SubArrays. In addition, the main motivation for this issue is to simplify the A_mul_B zoo, and higher order tensor contractions are not (and should not be) supported through normal multiplication anyway.

I am sure there are some new issues related with this approach that I have not yet thought of.

simonbyrne commented 9 years ago

I think I figured out a reasonable solution to the dispatch problem here.

@Jutho's proposal seems interesting, and I think is worth exploring. Unfortunately, the only real way to evaluate these things is to try to implement them.

timholy commented 9 years ago

@toivoh,

toivoh commented 9 years ago

The issue with 2:2 is the repetition if there is a long expression for the index instead of just 2. But of course you can always define your own function to create a range from an index.

JeffBezanson commented 9 years ago

Very good proposal :+1:.

Remind me why we want v' == v?

StefanKarpinski commented 9 years ago

We don't really need that but it's kind of nice since the dual of a (finite dimensional) vector space is isomorphic to it.

Jutho commented 9 years ago

Or more strongly, since Julia's arrays don't distinguish between covariant and contravariant indices, it only makes sense to think of this to be vectors in a Cartesian space (Euclidean metric=Identity matrix= kronecker delta), where indeed the dual space is naturally isomorphic.

toivoh commented 9 years ago

I'm not so sure we want v' == v, but I think that's pretty orthogonal to the rest. Do we want a column matrix and a vector to compare equal if they have equal elements?

StefanKarpinski commented 9 years ago

That's actually a different issue since they have different numbers of dimensions.

StefanKarpinski commented 9 years ago

In particular, this proposal effectively removes the identification between a vector and a column matrix – because if you slice a matrix horizontally or vertically, you get a vector either way. Previously you could kind of ignore trailing singleton dimensions – or pretend that there were more than there actually were. It will no longer be a good idea to do that because a vector can come from any slice of an array.

JeffBezanson commented 9 years ago

Would it make sense to convert something from 1-d to 2-d by adding a trailing singleton dimension?

StefanKarpinski commented 9 years ago

With this proposal, I think that might no longer be a good idea. But maybe since vectors still behave like columns while covectors behave like rows.

tkelman commented 9 years ago

One thing I noted in #8416 is that sparsevec is messily faked as a single-column CSC matrix right now. Sparse should be able to fit into this fairly well once a proper 1-d sparse vector type gets implemented (which would fall out as the simplest useful case of a generic N-d COO type, just needs to be written).

alanedelman commented 9 years ago

Just taking this all in. So the following would not work?

A[1,:] * A * A[:,1] # row from a Matrix * Matrix * column from a matrix ???

You wrote

v'w is the dot product of v and w – in particular, it's a scalar, not a one-element vector.

Also v' * w is a scalar ?

I like the idea of dot(x,y) taking any two items whose shapes are (1,...,1,m,1,...,1) and returning the dot product no matter what. But I don't want x*y to give dot(x,y) in this sense unless x is a covector and y is a vector.

Not sure if this is such a hot idea but maybe it would be okay if A[ : ,1,1] was a vector and A[1, : ,1] or A[:, 1, :] were covectors. Feels better to trail along a dimension for the vector -- the slot, on which you are allowed to contract the tensor, with standard linear algebra being 1 (row vectors) and 2 column vectors.

thomasmcoffee commented 9 years ago

In my view, the two major challenges we had previously addressed in this issue were:

(A) how to distinguish tensor semantics (for contractions) and array semantics (for broadcasting) when operating on multidimensional data; (B) how to embed obvious and convenient linear algebra within a consistent framework that generalizes to higher dimensions.

It's not clear to me how this proposal deals with either of these issues. So far as I can tell, to achieve (A) still requires ad-hoc user manipulations (as with present-day functionality); and to address (B) using lazy wrappers would require something like the SubArray extensions suggested by @timholy, at which point it becomes a lazy version of the masking approach discussed some time ago. I can imagine providing additional support for (A) using some similar lazy mechanism (like a List wrapper type), but in all these cases it seems to me like laziness should be an optional strategy.

I don't know how many share @Jutho's view that "higher order tensor contractions are not (and should not be) supported through normal multiplication anyway", but I couldn't disagree more: I only do what I consider ordinary engineering math, and I need them all the time. While current languages like Mathematica and NumPy have their design limitations in this regard (as I've discussed above), they are at least supported! For instance, as soon as you want to use the gradient of a vector field in a simple numerical method, you need higher-order tensor contractions.

JeffBezanson commented 9 years ago

When you say, "...have their design limitations in this regard (as I've discussed above), they are at least supported", are you talking about missing functionality, or something fundamental about vectors and transposes that cannot be addressed at a higher level, or by adding functions?

Does anything about this proposal conflict with improving on your points (A) and (B)?

Jutho commented 9 years ago

I really don't see how tensor contractions are supported by matlabs standard multiplication operator * , or through any other built in matlab function for that matter. Numpy has a built in function ( i forgot the name) but it is also rather limited as far as I remember.

I too need tensor contractions in their most general form all the time, that's exactly why I know that specifying the most general tensor contraction, let alone implementing it efficiently, is not entirely straightforward. That s why I argued that there need to be special functions for this, rather than trying to cram some half-working or rather specific functionality into the standard operators in Julia base, which doesn't cover half the use cases. But I am happy to change my opinion, e.g. If there is one 'standard' contraction that is so much more important/ useful than any other ? But this might be very domain dependent and hence not suitable as a general rule for adoption in Julia Base.

Op 19-okt.-2014 om 22:52 heeft thomasmcoffee notifications@github.com het volgende geschreven:

In my view, the two major challenges we had previously addressed in this issue were:

(A) how to distinguish tensor semantics (for contractions) and array semantics (for broadcasting) when operating on multidimensional data; (B) how to embed obvious and convenient linear algebra within a consistent framework that generalizes to higher dimensions.

It's not clear to me how this proposal deals with either of these issues. So far as I can tell, to achieve (A) still requires ad-hoc user manipulations (as with present-day functionality); and to address (B) using lazy wrappers would require something like the SubArray extensions suggested by @timholy, at which point it becomes a lazy version of the masking approach discussed some time ago. I can imagine providing additional support for (A) using some similar lazy mechanism (like a List wrapper type), but in all these cases it seems to me like laziness should be an optional strategy.

I don't know how many share @Jutho's view that "higher order tensor contractions are not (and should not be) supported through normal multiplication anyway", but I couldn't disagree more: I only do what I consider ordinary engineering math, and I need them all the time. While current languages like Mathematica and NumPy have their design limitations in this regard (as I've discussed above), they are at least supported! For instance, as soon as you want to use the gradient of a vector field in a simple numerical method, you need higher-order tensor contractions.

— Reply to this email directly or view it on GitHub.

alanedelman commented 9 years ago

here's a contraction over the last index of A and the first index of B sort of like mathematica's dot

function contract(A,B)
   s=size(A)
   t=size(B)
   reshape(reshape(A, prod(s[1:end-1]), s[end]) *  reshape(B,t[1],prod(t[2:end])) , [s[1:end-1]... t[2:end]...]...)
end

I've always been able to do general contractions with reshapes, permutes, and perhaps complex conjugates when needed more or less like the above

not sure what the big issue with tensors really is, why can't we just implement a few of these functions?

JeffBezanson commented 9 years ago

Yes, exactly. In this issue, all we want to settle on to move forward is

  1. What dimensions to drop in indexing? "APL style" seems uncontroversial.
  2. What does vector' give?

For tensor contractions, with appropriate types and staged functions I think we could actually get pretty high-performance implementations.

alanedelman commented 9 years ago

My feeling is that tensors will take care of themselves and we have to be sure that linear algebra users don't get frustrated.

My biggest concern is that

(take a row from a 2d array) * (2d array) * (take a column from a 2d array)

which is a common operation still won't work unless we take (take a row) with covector or perhaps better yet tag it with a general slot index.

thomasmcoffee commented 9 years ago

@JeffBezanson, when I say that these operations are supported, I mean that the built-in data types and functions are specifically designed with them in mind, for instance, like Mathematica's Dot function. Thus, for the user, there is a built-in, documented, and/or obvious way to do certain things. For any design, it is possible to achieve support for anything by adding functions, just as it is with the current implementation; so it's not an issue of technical conflict, it's an issue of design.

@Jutho, I don't use MATLAB much, so I can't comment. I agree NumPy's design is less coherent than Mathematica's (as I discussed above), but it also supports a richer range of behaviors. I agree that basic linear algebra should leave the general tensor machinery invisible to users who don't need it, but due to the terrific language features of Julia, it does not seem necessary to introduce divergent implementations for them, as both NumPy and Mathematica have been forced to do to some extent. It seemed to me this issue was, at least in part, about finding the right unified system for both, to reveal what specializations should be used for the common linear algebra cases: for instance, what to do about vector'.

StefanKarpinski commented 9 years ago

A[1,:] * A * A[:,1] # row from a Matrix * Matrix * column from a matrix ???

Correct – you would have to write A[1,:]' * A * A[:,1].

Also v' * w is a scalar ?

Yes, v'w are the same thing. One of the good things abot this proposal is that it completely eliminates cheap syntactic hacks.

Not sure if this is such a hot idea ...

I don't think it is. One of the goals of this proposal is to make slicing and indexing rules symmetrical and this would make one of the indices special, which seems to me to defeat the entire purpose. If slicing is going to be asymmetrical, we might as well keep the current behavior.

JeffBezanson commented 9 years ago

@thomasmcoffee You'll just have to be more specific. Of course everybody wants things to be coherent, documented, obvious etc. The question is, does the proposal on the table serve those goals? Maybe the current proposal doesn't affect those goals at all, which is ok --- then as long as it leads to an improvement elsewhere, we still have a net improvement.

alanedelman commented 9 years ago

So let me get this straight

If A is not square

Current Proposed MATLAB
A * A[1,:] No Yes No
A * A[1,:] ' Yes No Yes
A[:,1] *A No No No
A[:,1] '*A Yes Yes Yes

and if A is square

Current Proposed MATLAB
A * A[:,1] Yes Yes Yes
A * A[:,1] ' No No No
A[1,:] *A Yes No Yes
A[1,:] '*A No Yes No
StefanKarpinski commented 9 years ago

I swear I just posted an answer to this but somehow it disappeared into the ether. This is all correct. In the current arrangement, you need to consider whether you're taking a row slice or a column slice as well as whether you are multiplying on the left or right when deciding whether to transpose or not (columns are transposed on the left, rows are transposed on the right). In the proposal, you only consider which side you're multiplying on – you always transpose on the left, never on the right.

alanedelman commented 9 years ago

Would it be okay if

dot(x,y) and dot(x.',y) and dot(x,y.') and dot(x.',y.') all give the same scalar?

i.e. Σᵢ conj(xᵢ) * yᵢ

this way one can do dot(x,A*y) without thinking too much