Closed lsorber closed 8 years ago
First off, I entirely agree that there are many worrisome quirks (like sum(A, 2)
) in how Julia handles Array's. And I think we need some simple rules for specifying when broadcasting should happen.
But I'm a little unsure if I think we should adopt broadcasting of scalar-like arrays. This comes close to violating one of the design principles of Julia: the types of a method's outputs should be uniquely determined by the types of its inputs. While allowing something like A + x'' * x
doesn't literally violate this requirement, it leads to the perhaps puzzling situation in which inputs of certain sizes correctly produce outputs, while others produce dimension mismatch errors. That already does happen with Array's, so it's not a problem per se. But it does mean that Julia could become less intuitive rather than more intuitive.
@johnmyleswhite Julia already broadcasts Numbers, rule 4 just extends this to scalar-likes. The only reason I included rule 4 was for consistency with MATLAB, where A+x'*x
does not throw an error. I would prefer to remove rule 4, but I'm not against keeping it to so that MATLAB users feel more comfortable. In any case, I don't want to jump through hoops to compute A+x'*x
.
By hoops do you mean dot(x, x)
rather than x' * x
? While I agree it takes a while to get used to, changing it will require a lot of changes throughout the whole language. For example, assignments currently fail:
julia> x = ones(3)
3-element Float64 Array:
1.0
1.0
1.0
julia> y = ones(3)
3-element Float64 Array:
1.0
1.0
1.0
julia> y[1] = x' * x
ERROR: no method convert(Type{Float64},Array{Float64,1})
in setindex! at array.jl:394
julia> y[2] = dot(x, x)
3.0
I could see an argument for changing that as well, but I'm not sure where things stop before we're universally treating scalar-like tensors as scalars.
@johnmyleswhite It's not an easy decision, but the sooner a small and consistent set of rules for working with multidimensional arrays is thought out, the better. In the proposal above, scalar-likes are indeed treated as scalars, except that they have a dimensionality. A Number would be a zero-dimensional scalar-like, a 1-Vector would be a 1D scalar-like and so on. We should be able to do things like y[1] = x'*x
. The dimensionality of an Array should not change depending on which function you are calling (dot
changes dimensionality, yet sum
doesn't).
It's definitely a non-starter to consider 1-element vectors or 1x1 matrices to be "scalar-like". Multiplying a 10x10 matrix by a scalar is fine but multiplying a 10x10 matrix by a 1x1 matrix is and should be a dimension mismatch error. Likewise, rand(1)
is not a scalar at all, it's a single-element vector – they're very different. Making dot(v,w)
return a 1x1 matrix – or anything besides a scalar, for that matter – is just wrong. It's the scalar product, after all. In general, I'm unclear on how the shape of what sum
returns has any relation to what shape dot
should return.
The fact that M[:,1]
produces a vector while sum(M,2)
produces a matrix may be a slight annoyance, but I'm actually not entirely clear on sure why slicing and reducing should necessarily produce the same shape. Since sum has to produce a matrix in the sum(M,1)
case, it also has to in the sum(M,2)
case. Thus the only way to make these match would be to make M[:,1]
produce a column matrix instead of a vector. So all tensor slices would have the same number of dimensions as the tensor being sliced. I could maybe get on board with that, but it seems potentially pretty annoying, and I really don't know that M[:,1]
and sum(M,2)
having different shapes is all that big of a problem.
There are only two significant problems above as I see it, both involving transposes of vectors:
v'' != v
v'*v
is a vector rather than a scalarThe first issue could be addressed by ignoring trailing singleton dimensions when comparing tensors for numeric equality (==
as compared to isequal
, which could remain stricter). That, however, doesn't address the second issue and may or may not be a good idea in general.
Introducing transposed tensor types solves both problems:
v'
would be a Transposed{Vector}
and v''
would be of the same type as v
.Transposed{Vector}*Vector
to return a scalar and thus be the same as dot(v,v)
.This would also allow transpose to generalize to tensors by simply reversing the order of indices. The main reason we haven't done this is that I rather want the transpose to be lazy (i.e. share data with the original vector) and that introduce a number of issues.
@StefanKarpinski
It's definitely a non-starter to consider 1-element vectors or 1x1 matrices to be "scalar-like". Multiplying a 10x10 matrix by a scalar is fine but multiplying a 10x10 matrix by a 1x1 matrix is and should be a dimension mismatch error.
This is a question of whether scalar-likes should be broadcasted, not whether Numbers or 1-Vectors should be scalar-likes. I would also prefer not to do broadcasting for scalar-likes, so that A*randn(1,1)
would throw an error but A.*randn(1,1)
would not. Removing rule 4 would solve this problem.
Making dot(v,w) return a 1x1 matrix – or anything besides a scalar, for that matter – is just wrong.
In MATLAB, dot
can be used to compute the dot product along a dimension. To me, it is not so clear that the dimensionality of the result should depend on the dimensionality of the function's arguments. If A and B are 3D tensors, dot(A,B,3)
would presumably be a Matrix in your view, yet if they are 4D the result would still be 4D. Why disregard trailing singleton dimensions for dot
, but not for sum
?
I'm not sure what calling 1x1 matrices "scalar-like" means if it doesn't mean that they can be used like scalars. Regardless of whether you call a 1x1 matrix scalar-like or not, broadcasting is unproblematic: you broadcast along singleton dimensions and absent trailing dimensions and other dimension sizes must match exactly. When you broadcast two arrays, a
and b
, the result has the following shape:
map(i->max(size(a,i),size(b,i)), 1:max(ndims(a),ndims(b)))
@toivoh's new broadcasting code does this correctly and efficiently. In particular, multiplication acts as you propose:
julia> rand(5,5)*rand(1,1)
ERROR: *: argument shapes do not match
in gemm_wrapper at linalg/matmul.jl:289
in gemm_wrapper at linalg/matmul.jl:280
in * at linalg/matmul.jl:94
julia> rand(5,5).*rand(1,1)
5x5 Float64 Array:
0.633873 0.0593671 0.545569 0.438291 0.00419443
0.590569 0.450254 0.360447 0.415297 0.200254
0.461908 0.633632 0.406851 0.432888 0.027472
0.109804 0.037443 0.298195 0.428459 0.455538
0.0525897 0.554127 0.389851 0.069876 0.427595
Regardless of whatever other functionality it may have, I know that one thing should be true of the dot product: dot(v,w)
where v
and w
are both vectors is a scalar. How to generalize this operation to higher dimensions is an open matter, but any proposal that starts by suggesting that dot(v,w)
should be a matrix is not going to go anywhere. There are no trailing dimensions being discarded in the case of the dot product – it is simply defined to be the sum of the pairwise products of the elements of its vector arguments. I can even see defining dot(a,b) = sum(a.*b)
for general tensors. But again, that's an open question because it's pretty unclear what the best generalization of the dot product is.
The other issue that's come up here is that A[1] = [2]
doesn't work. It could be made to do so quite easily – of course, it should only work when the RHS a tensor with unit length (i.e. containing only a single value).
@StefanKarpinski
I'm not sure what calling 1x1 matrices "scalar-like" means if it doesn't mean that they can be used like scalars.
I see a scalar-like as something which has a number of dimensions but behaves as a scalar.
Regardless of whatever other functionality it may have, I know that one thing should be true of the dot product: dot(v,w) where v and w are both vectors is a scalar
The fact that dot
can be used to compute the scalar product of two vectors does not mean that other use cases should just take the back seat. I would argue that dot
is used just as often to contract two tensors along a given dimension. This just happens to be a scalar if the arguments are two vectors, just as x^T_x happens to be a scalar if x is a vector. In fact, for many people the contraction along a dimension is the reason to even have a dot
function in the first place, otherwise we would just do x'_x
as anyone who has used MATLAB would be used to. While I do not agree that the result should definitely be a Number in the case of two vectors, I do agree that the result should be treated as one where it makes sense.
The result of the following operations are all mathematically equivalent for a vector x:
x'_ones(size(x))
sum(x)
and the still different sum(x,1)
dot(x,ones(size(x)))
It is just wrong that the return types of these four operations differ and that some of them can't be used as scalars. What I would like is a uniform return type dimensionality, and whatever this dimensionality may be, that the result can transparently be used as a scalar. The way things are now, I honestly prefer MATLAB's simple yet crude solution of removing trailing singleton dimensions.
The problem is not limited to scalars. Analogously, I can not pass x = sum(A,2)
as an argument to the function foo(x::Vector)
because x is a Matrix. If x were treated as a vector-like, this problem would have been solved.
I can see that an array of shape (n,)
is perhaps equal to one of shape (n,1)
the same way that 2
and 2.0
are equal.
Now, IIUC mathematics does not recognize "type differences" as in "integer 2" vs. "float 2". The types are things we computer scientists put on things for our own purposes. So from a mathematical standpoint, one cannot complain about the types because they don't exist. However, one can complain about the results of functions such as 2==2.0
.
So it seems this is about allowing the scalars to be embedded in the set of matrices as 1x1
matrices the same way the reals are embedded in the complexes. Is that a fair description? This is consistent with rand().*rand(2,2)
giving the same answer as rand(1,1).*rand(2,2)
, which indeed we now have, so we are on our way. We could also add conversions from 2D Array to 1D Array (etc.) that check size(A,2)==1
, the same way that conversions from Float to Int check that the number is in fact an integer. And so on.
@JeffBezanson Yes, the embedding analogy is a good way of describing the problem. I'm not sure conversion is the solution though: if I have foo(x::Vector)
and foo(x::Matrix)
, what will happen when I run foo(sum(randn(5,5),2))
?
Could be of interest: the Tensor Toolbox for MATLAB explicitly stores the tensor's dimensionality, like Julia. From their TOMS paper:
A scalar is a zeroth-order tensor. An n-vector is a first-order tensor of size n. An m × n matrix is a second-order tensor of size m × n. Of course, a single number could be a scalar, a 1-vector, a 1 × 1 matrix, etc. Similarly, an n-vector could be viewed as an n × 1 matrix, or an m × n matrix could be viewed as a m × n × 1 tensor. It depends on the context, and our tensor class explicitly tracks the context, as described in Section 2.2.
Section 2.2 says:
Out of necessity, the tensor class handles sizes differently than MATLAB arrays. Every MATLAB array has at least two dimensions; for example, a scalar is an object of size 1 × 1 and a column vector is an object of size n × 1. On the other hand, MATLAB drops trailing singleton dimensions for any object of order greater than two. Thus, a 4 × 3 × 1 object has a reported size of 4 × 3.
The paper doesn't say why explicitly storing the dimensionality is a necessity. In other words, why not just drop trailing singleton dimensions, as MATLAB does (or more precisely, have an implicit infinite number of trailing singleton dimensions)? Speaking from my experience as lead developer of Tensorlab, also a MATLAB toolbox for tensors, I don't mind MATLAB's convention so much -- it has never been a hindrance in the development of Tensorlab. There is only one operation which comes to mind right now where the actual number of trailing singleton dimensions matters (and is perhaps the reason the authors of the Tensor Toolbox had in mind): the outer product. If A and B are two n x n matrices, their outer product is an n x n x n x n tensor. If A and B are two n x n x 1 tensors, their outer product should be an n x n x 1 x n x n x 1 tensor. An optimal solution would, at least in my view, keep track of the dimensionality of tensors, yet treat things that look like scalars as scalars, and things that look like vectors as vectors, and so on. The former Julia already does, the latter implies disregarding trailing singleton dimensions for most operations.
@StefanKarpinski I'm curious of the issues by introducing transposed vector,what would they be?
Could we make vector a type containing both value and direction fields? Direction field default to be column vector and has an alternative value of row vector. Would this solve the lazy problem by having the independant value field?
@lsorber, from the way you're talking about treating things as scalar-like or vector-like, it seems that you may be thinking of some kind of automatic conversion or identification of types in Julia's type system. This isn't something Julia does or will do. We had initially considered it – in particular identifying zero-tensors with scalars – but concluded that it's not a good idea for a variety of reasons, and that decision has turned out to be a sound one. Thus, zero-tensors, while they can be conceptually identified with scalar numbers, are not the same thing in Julia. Rather, they are just made to behave like scalars wherever possible, but they still have a different type and different representation. Similarly, we are never going to have foo(x::Vector)
automatically apply then x
is anything else other than an object of type Array{T,1}
for some T
. This is part of a general anti-magic philosophy in Julia: things don't happen automatically, they only ever happen because you asked for them to. While that may seem a bit limiting, Julia's dispatch system is sufficiently powerful that it's usually easy to add fallbacks that seem like magic. For example, it's easy to say something like foo(x) = foo(convert(Vector,x))
and then allow the conversion mechanism to handle attempting to turn various objects into a vectors for you.
[Taking a step back, the identification of vectors with column matrices is not a unique mathematical phenomenon – the integers are embedded in the rationals, which are embedded in the reals, which are embedded in the complex numbers. If you pay attention to their construction, however, that's really a sleight of hand. The integers can't really be a subset of the rationals because the very definition of the rationals depends on the existence of integers. What's really happening is that we identify the integers with a particular subset of rational numbers which happen to look and act just like the integers. The human brain is so good at dealing with ambiguous situations that this issue never really comes up – it just works. The way Julia handles this is similar: there is a subset of the Rationals that generally behave a lot like the Integers; however, we only identify them behaviorally, we do not try to make them actually the same type.]
@wlbksy, the main issue is whether to copy the data or not. However, there's also the issue that we may want to just do something more general: https://github.com/JuliaLang/julia/pull/3224. I'm not sure what the full design looks like, but I suspect that transposed vectors might be subsumed by some kind of generalized array view type.
I can accept that more operations should tolerate trailing singleton dimensions. Yes, conversion is not a "solution", just another example of the embedding. Although in some cases it can be a solution by adding definitions, as Stefan points out.
The mere existence of both foo(x::Vector)
and foo(x::Matrix)
is not enough to cause a problem. For example the matrix version could call the vector version in the Nx1 case, or the other way around. Or one of them might return an array of shape N, and the other Nx1, which is fine if you consider them mathematically equal. We do this kind of thing all the time, for example in the large number of definitions for matrix multiply. There are many cases, but they're all matrix multiply.
I don't understand why one would try to handle only trailing singleton dimensions. Perhaps there is something I'm missing, but take
a = zeros((4,4,4,4))
a[1,:,1,:] = ones((4,4))
for example. This is an assignment that currently doesn't work, and wouldn't work when handling only trailing singletons, yet it is unambiguous in intent and the array sizes differ only by singletons.
That is a separate issue (which shapes are compatible for assignment) covered by JuliaLang/julia#4048. This issue is about a formal embedding; i.e. the sense in which a 4x4
array and a 4x4x1
array are equivalent. I don't think anybody considers a 4x4
array and a 4x1x4
array equivalent in a general sense.
Now I see the reason why Julia treats x'*x
as a 1x1 Array but not as a scaler.
Let A
be a matrix and x
be a vector (column vector). If we treat x'*x
as a scaler, A + x'*x
is OK but A*x'*x
is not OK, since (A*x')*x
and A*(x'*x)
is different. so it is a good idea that x'*x
is a 1x1 matrix but dot(x,x) is scaler.
Let A has a size n x n x n. I feel that A[1,:,:] should have a size n x n, while A[1:1,:,:] should have a size 1 x n x n. This is much more consistent and uniform (since 1:1 is a range while 1 is not a range).
Currently, A[1,:,:] has a size 1 x n x n in Julia. This is not intuitive and is inconsistent (since currently in Julia, A[1,1,1] is a scaler not a 1 x 1 x 1 Array).
I think A[1,1,1] being a scaler is the right behaviour, while A[1,:,:] being an Array of size 1 x n x n is not the right behaviour.
Currently, A[1,:,:] has a size 1 x n x n in Julia. This is not intuitive...
Intuition is subjective--if I slice the top layer off a cube (three-dimensional array) and don't rotate it, it has the appearance (shape) of a layer of a cube which still has depth, not the front face of the cube (a matrix)
@wenxgwen I also think that is inconsistent. In numpy these would be different shapes:
In [11]: A[1, :, :].shape
Out[11]: (2, 2)
In [12]: A[1:2, :, :].shape
Out[12]: (1, 2, 2)
@pao I think it's hard to argue that this is intuitive, if the range is the last argument they have different shapes!
julia> A[:,1,1:1]
2x1x1 Array{Float64,3}:
julia> A[:,1,1]
2-element Array{Float64,1}:
julia> A[1,1,:]
1x1x2 Array{Float64,3}:
julia> A[1:1,1,:]
1x1x2 Array{Float64,3}:
I'm not arguing that Julia's doing the right thing, or doing it consistently, just that I don't believe that "intuition" is useful guidance here.
I think that this is one of the most critical issues to be decided before julia gets too far along (certainly well before 1.0). I think some of present rules are overly magic. I have slowly come to the belief that a simpler array indexing system is preferable.
I would propose the following 4 rules for array indexing, reduction operations, and broadcasting:
I believe the above system would make it easier to reason about code and be less error prone then the present system. Since it makes it easier to drop dimensions, uses of squeeze would go drastically down. However, the need to add dimensions will go up and so a simpler system for adding singleton dimensions would be helpful. There are array languages that have syntax for this, e.g. pseudo-indexes in Yoric that uses '-' for this (but '!', '*', or '$' may be better choices). Thus:
a = reshape(1:12, 4, 3)
size(a[:,:]) # (4,3)
size(a[:,-,:]) # (4,1,3)
size(a[:,-,:,-]) # (4,1,3,1)
This syntax would make broadcasting operations easy again and would make the intent clearer than the present system. Let me know if I should move this comment into its own issue since it addresses possible new syntax and goes beyond the present issue.
Any proposal of dropping singleton slices really needs to address @toivoh's associativity problem, which no one has yet done.
@StefanKarpinski Are you addressing @BobPortmann? I'm not sure I understand the connection between dropping dimensions with integer indexes and reduction operations (which I fully agree with) and the dimensionality that results from matrix multiplication.
@timholy What was your objection to reduction functions dropping a dimensions? I know you've expressed skepticism about that, but I'm not sure why. I have a lot of code of the form sqeeze(f(x,n), n)
that I would love to eliminate. If it's about making broadcasting easier, numpy semantics allow for things like
x=randn(3,5)
col_means = mean(x,0)
z = x-col_means
where col_means
is a one-dimensional array. That seems like the best of all worlds.
Like JuliaLang/julia#5949, this isn't a proposal to drop all singleton slices, merely a proposal to use scalar/range to encode drop/not drop. I.e, A[3,:]
drops and A[3:3,:]
doesn't. So it's a bit orthogonal to @toivoh's point (which is excellent, as always).
@BobPortmann, perhaps there's a more technical meaning to "reduction" than I'm aware of, but I'd assert that "reduce" can apply at least as well to the size of an array (I've "reduced the amount of data in the array") as to its dimensionality. Personally, I'd prefer to keep the dimensions lined up when one does reductions.
@malmaud, what does mean(x,0)
do? (What in particular is the 0 for?)
Currently both X .- mean(X,1)
and X .- mean(X,2)
work with no fanfare. For 2d X
, the first mean gives you a row vector, the second mean gives you a column vector. At least in my own code, this kind of thing is far more common than squeeze
. Inserting -
at the right spot in a list of colons might work fine if the dimension you're reducing along is hardcoded (3
), but when it's stored in a variable this seems like it would be a major pain in the neck. Compare:
m = mean(X, dim)
msq = squeeze(m, dim)
to
m = mean_drop_singletons(X, dim)
indexes = Any[colon() for i = 1:ndims(m)]
insert!(indexes, dim, '-')
mnosq = m[indexes...]
@timholy The opposite of
msq = squeeze(m, dim)
would be
msd = add_unit_dim(m, dim) # or maybe add_dim/adddim
which would do the right thing. I don't see how it is any harder. And if there was
msq = mean(X, dim, unit_dim=true)
you wouldn't need it.
What worries me about dropping dimensions all the time is that dimensions can have meanings. An RGB image might be NxNx3, and after selecting one row or column it should still have 2 spatial and 1 color dimension(s). On Feb 25, 2014 6:04 PM, "Tim Holy" notifications@github.com wrote:
@malmaud https://github.com/malmaud, what does mean(x,0) do? (What in particular is the 0 for?)
Currently both X .- mean(X,1) and X .- mean(X,2) work with no fanfare. For 2d X, the first mean gives you a row vector, the second mean gives you a column vector. At least in my own code, this kind of thing is far more common than squeeze. Inserting - at the right spot in a list of colons might work fine if the dimension you're reducing along is hardcoded ( 3), but when it's stored in a variable this seems like it would be a major pain in the neck. Compare:
m = mean(X, dim) msq = squeeze(m, dim)
to
m = mean_drop_singletons(X, dim) indexes = Any[colon() for i = 1:ndims(m)] insert!(indexes, dim, '-') mnosq = m[indexes...]
Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/3262#issuecomment-36070243 .
As I argued in JuliaLang/julia#5949, dropping dimensions in reduction would make it particularly difficult to write things like x .- mean(x, dim)
. I think @timholy agrees with this, and his argument above is convincing. The issue of whether to drop dimension in a[3, :]
is not as obvious.
When comes to indexed views, I am not completely sure why the current way is less desirable than @BobPortmann's proposal. I think it is consistent and easy to understand, and just works in many practical cases without using extra functions like squeeze
and add_new_dimension
, etc.
The '0' is the dimension of the reduction (since python is 0-based, the first dimension). My point was just that in python, x - mean(x,0)
(which would be x.-mean(x,1)
in julia) works just as it works in Julia, but the python version has the added benefit that mean(x,0)
is a 1-dimensional vector, which I think is generally preferable that having mean(x,n)
always return a sometimes-annoying singleton dimension.
edit:
My point here is exaggerated, because the python example only works here since it's a column-wise operation. The equivalent row-operation x-mean(x,1)
would not work in Python, while the Julia x.-mean(x,2)
does.
If we drop the dimension in the first place only to find that we have to add it back in order for the ensuring interactions with other arrays to work properly, then there is a question why we would want to drop the dimension initially.
@StefanKarpinski It seems to me that the associativtiy problem you reference is different. I think that if you want v'M*w
to work with v,w
vectors and M
matrix then there needs to be a new transposed vector type, as @toivoh says in that post.
@lindahua What if x .- mean(x, dim)
only had to become x .- mean(x, dim, true)
. That doesn't seem particularly difficult.
Yes, but it's an extra argument to every single reduction function.
@lindahua "then there is a question why we would want to drop the dimension initially"
I dislike the special casing of tailing dimensions (both Julia and IDL do this). Let x be a 3D array. x[3,2,:]
is a 3D array but x[:,3,2]
is a 1D array. I think they should be consistent. I prefer to think of them both as 1 dimensional slices (i.e., both are 1D). I guess I wouldn't know exactly how it feels to use such a system until it was tried since none of the ones I use does this (fortran90 probably comes closest, but its not dynamic). I definitely think x[:,:,[2]]
should be a 3D array and julia presently gets this right (IDL does not which can cause problems).
I do think that if v
is a 1D array and x
is 3D array then v .+ x
should give an error. Arrays should be the same rank to consider broadcasting. However, this is somewhat independent of the dropping dimension issue.
In my view, consistency means there is a simple principle/rule that everything conforms to. From this perspective, @BobPortmann your proposal is consistent, treating both x[3,2,:]
and x[:,3,2]
is also consistent (based on a different rule).
The current Julian way is also a consistent framework, that is ndims(a[I1, I2, .., In])
equals to the index of the last non-scalar index. This invariance is very nice, without special cases, which even applies to the case of a[i1, i2, ..., in]
where all indices i1, i2, ..., in
are integers.
Generally, there are more than one consistent ways to define things. We have to carefully choose one that provides us the most practical benefit. I think the current Julia approach is a reasonable & sane approach.
@lindahua It may be consistent, but trailing dimensions still get treated different. In any case, I won't argue semantics, I think you know what I'm trying to say. I hope people give some thought to my proposal and don't just casually dismiss it cause it is a big change. Many of the ideas in this issue are big changes so I cannot understand why it is labeled milestone 1.0.
I mean, Matlab is also 'consistent' in that limited sense: the simple principle is that all singleton dimensions are dropped when squeeze is explicitly or implicitly called. But I think we all agree that's bad. I find it disturbing that
function getslice(myarray:Array{Int,2}, dim, idx)
indexers = {1:size(myarray, 1), 1:size(myarray, 2)}
indexers[dim] = idx
getindex(myarray, indexers...)
end
is not type-stable, since the return type depends on whether dim==1
or dim==2
. I can't really think of any reasons why the right time to drop dimensions is only when indexing with trailing scalars. It seems whether you agree or disagree with @JeffBezanson's point above, integer indexing should either never drop or always drop a dimension, respectively.
My point is not actually against @BobPortmann's proposal. Just that many different ways can be seen as consistent, and therefore we may need stronger reasons as to why we prefer one to another.
@BobPortmann:
add_unit_dim(m, dim)
Certainly you're right. The question is, which happens more often: the current objectionable call to squeeze
, or the potential future objectionable call to add_unit_dim
. In my own code I can promise you the latter would happen more often, and therefore be more annoying. (I somehow also find it more intuitive that squeeze(A, (2,4))
affects the current dimensions 2 and 4, whereas with add_unit_dim
I find myself wondering how the 2 and 4 intercalate into the current dimensions. But this is probably just me, and I would soon find it perfectly clear.)
I agree with Stefan that the extra argument to reduction functions is also annoying. On balance I don't find the arguments about slicing out reduced dimensions very convincing.
I also agree with @lindahua that the current scheme is consistent, and that there are multiple consistent solutions. It does, however, seem to me that the rules are simplest if you just drop scalar-indexed dimensions everywhere.
@JeffBezanson, I agree that dimensions have meanings, but you can keep them using range indexing rather than scalar indexing. However, one annoying consequence of this proposal is that for any algorithm that accepts indexes but wants to keep the meaning of all the dimensions, one will need a call to something like rangeify_scalars
:
function crop(img, indexes)
I = rangeify_scalars(indexes)
img[I...]
end
where
rangeify_scalars(indexes) = map(x->isa(x,Real) ? (x:x) : x, indexes)
Remembering to insert this call will surely be a source of bugs, at least a first.
@StefanKarpinski It seems to me that the associativtiy problem you reference is different. I think that if you want
v'M*w
to work withv,w
vectors andM
matrix then there needs to be a new transposed vector type, as @toivoh says in that post.
If you have a transposed vector type then that type is precisely what taking a row slice of a matrix should give you. Which brings us back to exactly where we are right now.
I think the best proposal aside from we currently do is @toivoh's "absent" dimensions idea. This could be represented with a dimension size of -1, although that's an implementation detail. Dimensions sliced with scalars become "absent" rather than being removed entirely. Then a row vector is -1 x n and a column vector is m x -1. All dimensions are treated the same – the trailing ones are not special. Something about this bothers me, however, although I'm not entirely sure what it is. The scheme can be generalized by allowing negative dimensions of arbitrary size. Then we get something very much like "up" and "down" dimensions. But then it starts to get very complicated. Should a matrix have an up and a down dimension? Should it have two up dimensions if it's a bilinear form?
Re dropping dimensions in reduction operations, here's another reason: if we dropped dimensions, mean(x, [1,2])
would not be type-stable.
@timholy That doesn't have to be the case. It could do as squeeze
julia> squeeze(mean(eye(2),(1,2)),(1,2))
0-dimensional Array{Float64,0}:
0.5
Sure, if you pass the dims with a tuple, but there are good reasons to allow people to use vectors.
There have been/are a multitude of issues and discussions on dimensionality compatibility between Arrays: JuliaLang/julia#113, JuliaLang/julia#141, JuliaLang/julia#142, JuliaLang/julia#231, JuliaLang/julia#721, JuliaLang/julia#1953, JuliaLang/julia#2472, JuliaLang/julia#2686, JuliaLang/LinearAlgebra.jl#8, JuliaLang/julia#3253, julia-dev threads one, two, and probably many others.
The core problem is that Julia's rules for interacting with Arrays of different dimensions are just too difficult. In the following, let A be a Matrix and x be a Vector. For example, indexing can drop dimensions (e.g.,
A[:,i]
is a Vector), yet reduction operators such assum
,min
,std
do not (e.g.,sum(A,2)
is a Matrix). Numbers are broadcasted, yet length 1 Vectors and Matrices are not (e.g.,A+5
works, butA+x'*x
does not). A solution to the latter example is to useA+dot(x,x)
, but this is neither intuitive nor convenient. By now it is clear that the current situation is far from ideal. What Julia needs is a small and consistent set of rules for interacting with multidimensional arrays. Below I propose such a list, but I certainly do not wish to imply that this is a best solution.First, I am of the opinion that Julia's decision to have each object have a dimensionality (a number of dimensions) is the correct one. In MATLAB, trailing singleton dimensions are ignored, yet there are many cases in which it is important to consider how many dimensions an Array has. So let's assume that each object should definitely keep track of its dimensionality.
Second, let's consider broadcasting. In JuliaLang/julia#1953, it was suggested that making the
+
and-
operators broadcast for things other than Numbers is "a little too magical". I agree with this statement and think that.+
and.-
are a good solution (so that+
and-
can throw dimension mismatch errors). As a side note, there are many other operators that would also benefit from broadcasting such asmod
,rem
,max
andatan2
(in general, most binary functions). The problem here is that bothrandn()
andrandn(1)
are scalars, yet only the former is a Number. This is whyA+randn()
works, butA+randn(1)
and evenA+x'*x
do not (but should!).Third, trailing singleton dimensions are ignored when adding or subtracting arrays, promoting the return type of the result to the maximal dimensionality of the two arguments. This is behaviour a user would expect, and I would like to expand on this idea per JuliaLang/julia#2686. If
ones(3) + ones(3)''
works, thenones(3) == ones(3)''
should also be true.Here is a small set of rules which hopefully represents the behaviour we desire:
ndims
does not depend on the number of trailing singleton dimensions (unlike in MATLAB).sum
) do not change the the number of dimensions. Another example is thatdot(x,x)
andx'*x
should both have size 1 x 1. [*].+
and.-
). [**]ones(3)'_ones(3,1,1)
is a 1 x 1 x 1 scalar-like andones(3) == ones(3)''
istrue
.[] The only exception is if a function explicitly removes dimensions, such as
squeeze
. [*] The only exception is that broadcasting is not enabled for equality testing with scalar-likes, e.g.,ones(3,3) == 1
should befalse
.For consistency, I would not even mind going so far as to remove rule 4, requiring people to write
A.+5
instead ofA+5
. The additional required effort is small, and the former makes it clear that the developer is explicitly requesting broadcasting.