proposal: spec: multidimensional slices

dadkins commented 11 years ago

As per Rob and Andrew's request:
https://groups.google.com/forum/#!topic/golang-nuts/Q7lwBDPmQh4

sbinet commented 8 years ago

with the new unshape builtin, I must say it's starting to become a bit crowded.

I am tempted to bring back my original suggestion (somewhere on gonum-dev):

slice := make([*]int, 2, 3, 4)      // a 2x3x4 n-dim slice
sub1 := slice[:,1,:]                // a 2x4 n-dim slice
sub2 := reshape(slice, 6, 4)[:2,:3] // a 2x3 n-dim slice

for i, v := range slice {
    for j, u := range v {
        for k := range w {
            fmt.Printf("slice[%d,%d,%d] = %v\n", slice[i,j,k])
        }
    }
}

[*]T is a strided-ndim slice whose elements have type T.
len(slice) returns a slice of length equal to the number of dimensions of the ndim-slice (ie: []int{2,3,4} for the above slice ndim-slice)
a ndim-slice can not be appended to, and has a fixed capacity, equal to its length

at the reflect level, an ndim-slice could be represented like:

type NdSliceHeader struct {
    Data   unsafe.Pointer
    Len    []int
    Stride []int
}

nd-slice literals

v := [*]int{1,2,3,4} // a 1x4 nd-slice
u := reshape([*]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
w := reshape([*]int{1,2,3,4}, 2, 1, 2} // a 2x1x2 nd-slice

// reshaping a slice is also allowed and creates an nd-slice
v := reshape([]int{1,2,3,4}, 2, 2) // a 2x2 nd-slice
// perhaps also this conversion could be allowed, like string/[]byte:
v := [*]int([]int{1,2,3,4}) // a 1x4 nd-slice

copy

src := make([*]int, 2, 3)
dst := make([*]int, 4, 3)
// copy returns the number of elements copied in each dimension
n := copy(dst[:2,:], src[:1,1:]) // n == []int{1, 2}

wrt to the non-strided proposal, you get the ability to slice in all dimensions. you loose the compile-time check of rank (ie number of dimensions) of an ndim-slice, so you could imagine a situation where you'd pass a 2d ndim-slice to a functions expecting a 3d one. and you probably loose a bit in random-access to elements because you need to multiply by the stride in each dimension (and fetch those strides).

btracey commented 8 years ago

This is similar in spirit to @yiyus proposal. I don't think it actually gets you around unshape though, unless you want to forbid people from ever getting the underlying slice without using unsafe. One important usage is passing data to C, I.e. Lapack and other programs. Also note that with strided slices one needs to allocate and make a copy before lapqck (and others) since they require contiguous data.

sbinet commented 8 years ago

wrt using unsafe: that's already the case for a slice and its array. you can't go back at the underlying array of a slice. I see "my" nd-slice and its non-reshaped slice as the same pair than a slice and its underlying array.

if/when a mechanism is devised to get (safely) the underlying array from a slice, I suppose it could be transposed to the nd-slice/slice pair. but, to pass data to C, you need to use unsafe in some way, so...

sbinet commented 8 years ago

ah! one thing I forgot to mention in my nd-slice post: slicing a nd-slice, modifying its stride. as a nd-slice's capacity can't be modified, we can use the 3-index slice as a way to specify the stride when extracting a sub-nd-slice:

v := reshape([*]int{1,2,3,4,5,6}, 2, 3) // a 2x3 nd-slice
// 1 2 3
// 4 5 6
u := v[:, 0:3:2] // a 2x2 nd-slice, taking one column every two
// 1 3
// 4 6
w := v[::, ::2] // == u

sbinet commented 8 years ago

looking at how tensorflow wraps around its C api is interesting and a valuable data point, /me thinks: https://github.com/tensorflow/tensorflow/blob/0e5d49d362a5ef72179e385d1f71ec29ab0392f6/tensorflow/go/tensor.go

griesemer commented 7 years ago

To make progress with this proposal with the goal to come to a decision eventually, here's an abbreviated summary of the discussion so far.

Primary goals

First of all, there are some overarching goals that this proposals attempts to achieve. To name a few (borrowing from https://github.com/golang/go/issues/6282#issuecomment-66084786, and https://github.com/golang/go/issues/6282#issuecomment-66084788):

It should be straight-forward to write typical numerical algorithms in Go in a natural way.
There should be a “standard” mechanism to represent multi-dimensional slices/matrices in Go (one standard way to represent matrices, for instance).
Such algorithms should be implementable in a reasonably efficient manner, with a performance coming close to a typical C implementation.

Virtually everybody also seems to be in agreement with respect to indexing notation and memory layout:

Slice/vector/matrix elements should be accessed via the familiar indexing notation, suitably extended to multiple dimensions: v[i], m[i, j], t[i, j, k], etc.
A multi-dimensional slice or matrix must be laid out contiguously in memory, without pointers to sub-slices. Successive slice elements may not be adjacent in memory if they have a "stride" that is > 1 (where 1 is the size of a slice element).

Proposed design

@btracey, with input from the gonum team, has spent considerable effort coming up with a concrete design for multi-dimensional slices with the intent to address the goals of this proposal: https://github.com/golang/proposal/blob/master/design/6282-table-data.md . Thank you @btracey , and the gonum team for your significant effort!

The above design addresses many of the desired goals of this proposal and, as a significant plus, the proposed multi-dimensional slices are in many ways a natural extension of the one-dim. slices we already have in Go.

Problem areas

The single-biggest issue with the proposal is that one-dim. slices don’t have a stride exactly because multi-dim. slices gracefully degrade into Go’s existing slices. This problem has been pointed out as early as https://github.com/golang/go/issues/6282#issuecomment-66084790, before a concrete design document was posted. The design document addresses this issue with various work-arounds.

As a concrete example, given a two-dim. slice m representing a matrix [,]float64, with the proposed design it is easy and efficient (no copy of matrix elements involved) to select a matrix row i as a sub-slice m[i, :] but it is impossible to select a matrix column j that way (m[:, j] is invalid). In other words, indexing is asymmetric, and the asymmetry will lead to special treatment in algorithms (columns must be explicitly copied element-wise).

To support various "reshaping" operations for multi-dim. slices, the design proposes operations such as reshape, and unshape. @sbinet points out that the design doc has become a bit crowded in terms of additional predeclared functions ( https://github.com/golang/go/issues/6282#issuecomment-242921725 ).

Alternatives

Several alternative proposals have been floated as well:

https://github.com/golang/go/issues/6282#issuecomment-66084793 proposes that the std library simply define a standard Matrix format (and perhaps Vectors, etc.).
https://github.com/golang/go/issues/6282#issuecomment-242226027 proposes a “shaped slice” type for Go which is similar to multi-dimensional slices but supports strides in each dimension, and consequently doesn’t gracefully degrade into a regular (unstrided) slice in the one-dim. case.
https://talks.golang.org/2016/prototype-your-design.pdf discusses as an example for that talk the implementation of a rewriter that can automatically rewrite indexing expressions of the form a[i, j, k] and a[i, j, k] = x into method calls a.At(i, j, k), and a.Set(i, j, k, x) respectively. While this approach does not extend the language per se, it permits writing numerical algorithm using "nice" notation which is then automatically translated into regular Go. It also has the advantage of providing full control over the underlying implementation of multi-dim. slices. A complete prototype implementation can be found in https://github.com/griesemer/dotGo2016).

Summary

The proposed design of multi-dim. slices is a natural extension of Go's existing one-dim. slices. From a language point of view, the design appears backward-compatible with Go 1; and it does address many goals of the proposal. That said, the asymmetry of indexing operations requires non-obvious work-arounds when implementing numerical algorithms which runs counter one of the primary goals ( https://github.com/golang/go/issues/6282#issuecomment-66084786 ) of this proposal.

btracey commented 7 years ago

I agree with the summary above. Two quick comments (especially for those thinking of making a new proposal)

I think any proposal that seeks to extend Go slices will have the same downsides as my proposal. The asymmetry is fundamental, but also I think it is difficult to reduce the scope of the changes without harming the benefits brought by the change. As a simple example, removing unshape and reshape makes it harder to interface multi-dim slices with data streams and C code
Another alternative suggestion is to modify Go to allow index operator methods. This would be similar to the talk by @griesemer, except an actual change to the Go spec, and not just a rewriter program.

Thanks to @griesemer for the significant effort invested in this issue.

j6k4m8 commented 7 years ago

Thank you @griesemer — a really good summary of the challenges and benefits.

To elaborate on my 👍 to @btracey's comment above, I especially want to get behind index-operator methods: I suspect that moving index-operator syntax from rewriter-land to native Go would buy considerable power, and would allow libraries to handle more of the heavy-lifting when our implementation preferences diverge (e.g. axis-reordering operations can exist in a library, and needn't exist in native implementation).

griesemer commented 7 years ago

This proposal addresses a large portion of the originally stated goal: better Go support for numerical applications. However, as also has become quite clear, it falls short in others (asymmetry of proposed solution, potential proliferation of builtins). Judging from the feedback received on this issue, there is no clear consensus that the shortcomings can be safely ignored.

Adding multi-dim. slices to the existing Go language would be a significant engineering effort. It seems unwise to make this effort with full knowledge of the proposal's inherent problems. Furthermore, exactly because the proposed solution ties so closely into the existing language, it would be nearly impossible to change or adjust the design down the road.

Thus, after repeated and careful consideration, we are going to decline this proposal.

That said, the discussions so far have been extremely helpful in delineating the problem domain, identifying key issues, and for identifying possible alternative approaches.

We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2.

Again, thanks to everybody, and particularly @btracey, for your contributions and time spent on this proposal.

-gri, for @golang/proposal-review

dm319 commented 7 years ago

Time to revive this discussion? As a go and R fan, how does fortran approach the problems seen here?

btracey commented 7 years ago

This issue is closed (and the proposal declined) for good reason. If you're looking for similar functionality, please see the gonum packages (gonum.org)

dm319 commented 7 years ago

We suggest that this discussion continue with the intent to come up with a new and improved design (possibly along the lines of one of the alternatives above) with the intent of having a blueprint for consideration for Go 2.

-gri

Maybe I should have mentioned that my comment was prompted by the recent announcement about working towards Go 2 (https://blog.golang.org/toward-go2).

SamWhited commented 7 years ago

Maybe I should have mentioned that my comment was prompted by the recent announcement about working towards Go 2

Consider filing an Experience Report with any specific issues you've run into with real code. We can't talk about proposals and solutions until we know what the actual underlying problems are.

shelby3 commented 6 years ago

@griesemer wrote:

[…] but it is impossible to select a matrix column j that way (m[:, j] is invalid). In other words, indexing is asymmetric […]

I only had 5 minutes to learn about this issue by perusing this thread, thus my very rushed thought is that perhaps the problem can be in theory solved with typeclasses.

Essentially the 3. Struct type is the most correct solution to the issue presuming monomorphisation (inlining) and a smart enough optimizing compiler¹. And btw, it’s the first solution that came to mind within 1 minute before I saw it proposed, so why did it take 4 years? So if we have typeclasses and operator overloading then the SetAt noise is replaced with the [] as we desire. Also the slice of a column as a matrix becomes another kind of struct which has a different typeclass implementation. Tada!

Afaics, with typeclasses the entire thing can be handled with libraries. And so we stop burdening the native with what can be in a library. ~Actually Go’s interfaces as is may be sufficient to implement the column slice as a specialized struct?~[Edit: on further thought no Go’s interfaces lack the concept of an associated type in order for instances of the typeclass to recursively declare the special struct needed to take slice on a column of a matrix.]

Remember that typeclass bounds at the call site select the correct interface automatically based on the input data type. Go’s interfaces sort of do that also, but there’s some differences and limitations.

Apologies if in my haste I missed some points that cause my post to be noise. Please courteously correct me if so.

^{P.S. Is this implicating that Go has been without subslicing matrix capability for 5 years because of a lack of sufficient higher-level abstractions support in the language? Yet, I presume combining higher-level abstractions and maintaining low-level control performance is a difficult design challenge and maybe even insurmountable in general.}

¹ _{One of the reasons along with higher-level abstractions perhaps OCaml is favored by hedgefunds?}

Jonconradt commented 6 years ago

It appears that this was not submitted as an Experience Report. I am wondering why not?

sbinet commented 6 years ago

Probably because this happened before that procedure.

golang / go