Closed rogpeppe closed 3 years ago
I think this functionality would be nice. Personally I would rather not assume that the compiler can subtract arbitrary expressions (as in b := a.[0:4]) but instead say explicitly what type I want: b := (*[4]int)(a[0:4]) The argument against this is that we hoped introducing x[:] would let us get rid of the implicit conversion from *[4]int to slice. Maybe it still does, but we allow the explicit one. There are certainly compelling cases (mostly in low-level things like jpeg or sha1 block processing) where converting a slice to *[N]int for some N would eliminate many bounds checks for cheap.
Owner changed to r...@golang.org.
Yes, that's a disadvantage of (*[4]int)(a[0:4]). A disadvantage of a[0:4].(*[4]int) is that it takes over syntax currently reserved for interface values. At one point conversion syntax and type guard syntax was interchangeable. It clarified things quite a bit to require that in x.(T), x must be an interface value and that in T(x), the conversion must be statically guaranteed to succeed. Unfortunately, this particular conversion doesn't fit into either of those categories. We've got enough going on that's it going to be a while before we do anything with this.
I just encountered a nice example of when this functionality might be useful. To quote from a reddit user on why Go "needs" pointer arithmetic: "One well-used example is making classes as small as possible for tree nodes or linked list nodes so you can cram as many of them into L1 cache lines as possible. This is done by each node having a single pointer to a left sub-node, and the right sub-node being accessed by the pointer to the left sub-node + 1. This saves the 8-bytes for the right-node pointer. To do this you have to pre-allocate all the nodes in a vector or array so they're laid out in memory sequentially, but it's worth it when you need it for performance. (This also has the added benefit of the prefetchers being able to help things along performance-wise - at least in the linked list case).'' You can *almost* do this in Go with type node struct { value int children *[2]node } except that there's no way of getting a *[2]node from the underlying slice.
If neither syntax is ideal, perhaps a new unslice builtin? ary, ok := unslice([n]T, slc) Though should ary have type [n]T or *[n]T? If n is large and the unslice fails, a large zeroed array might not be ideal. Anything wrong with this that I'm not seeing? Well, besides it being another new builtin.
If we were going to do it - which is far from even up for debate - I think the syntax x.(*[10]int) works well (x has type []int). You can't .(T) a slice type right now, so it would not be overloading anything, and like an interface type assertion it can fail or be checked at run time. You can even think of it as []int containing a *[10]int the same way an interface value holds a concrete type, and you're extracting the underlying array pointer. That said, I don't think this is important enough to worry about now. There is enough else going on.
Since this is something that's been bugging me lately too, I thought I'd add a few random thoughts I had that don't seem to have been mentioned: Allowing conversions from slices to array-pointers means pointers can now refer to partially overlapping objects. I don't believe that's currently possible in the language. Slices can already overlap though, so it's not a big change overall. Like #8 says, []T is sort-of an interface type for *[N]T, so type assertions are arguably suitable syntax. Except that cap(x.(*[N]T)) might give a different value than cap(x), which isn't true for other interfaces/implementor-type relations. Seems like an open question whether this inconsistency is worth accepting into the language, and really since there's already a way to convert a *[N]T into a []T, just the ability to turn a []T into a *[N]T is the relevant missing feature. It would be nice if an expression like x[e1:e2] could actually have a static type of *[e2-e1]T (assuming e1 and e2 are constant expressions), then you could write something like *dst[16:24] = *src[136:144] and the compiler can verify that the array bounds match up. Unfortunately, the expression can't actually be x[e1:e2] since existing code might rely on cap(x[e1:e2]) == cap(x)-e1, and that would be a backwards incompatible change. The x.[e1:e2] syntax suggested originally would solve this issue. If you want a range like x.[n:n+4], instead of requiring the language to recognize this pattern somehow, x[n:].[:4] is equivalent and has static indices at the expense of clunkier notation. A short-hand notation like x.[n:+4] might be nice to indicate that 4 is a length not an end position, but not strictly necessary and complicates the language. (Also +4 here is technically ambiguous here whether it's length 4 or end position "+4", so again some new notation would be necessary.)
I just want to note that we have a use case for this at go-gl. More information: https://github.com/go-gl/gl/pull/111 The underlying OpenGL API only accepts the equivalent of, e.g, a *[4]float32, so it is nice to have this in the type system on our side. OTOH, a consumer of this API might be holding a []float32 they want to pass to us. So it would be great to find a solution to this, as the current solutions are a bit messy or require the use of unsafe.
An alternative which occurred to me was in a function which only indexes a slice with constants values (eg the JPEG routines or unrolled FFTs which are what I'm working on) and the slice doesn't change the compiler could bounds check the slice just once at the start of the function with min(constants) and max(constants). This would achieve the removal of the bounds checking without a language change. It wouldn't allow the compiler to do range checking at compile time though.
Another use case is to be able to use a small slice of known length as a map key. The Go API uses slices heavily even though the same length is expected. Being able to get the underlying array would allow usage as map keys. Two examples I've run into recently are hashes (SHA-256) and IP addresses (as 16 byte slices). It seems rather wasteful to have to copy them or transform them to strings to have to use them as map keys.
FWIW, something similar to unslice() above can be implemented with reflect and unsafe. Despite being implemented in terms of unsafe, I believe unslice itself is safe. I don't know whether it violates any assumptions made by the GC, however.
Actually I believe the compiler could easily be smart enough to make a single range check for statement like this:
return a[0] + a[1] + a[2] + a[3]
Is there a good way to do this currently?
For example if one function gives me a slice as output and I need to use that output in another function that wants a fixed array as input. What is the best way to coerce the slice into an array that is the current size of the slice and containing it's current members?
Currently there is no safe way to convert from a slice type to an array type (that is the point of this issue).
You can do it using unsafe by writing code like
(*[10]byte)(unsafe.Pointer(&b[0]))
@chalonga or you can copy the data in the slice into a new array.
Is anyone still interested in this? I have a prototype implementation. I've simplified the syntax to
ArrayAssertion = "." "[" Expression "]"
If needed, s[min:max].[max-min]
, where min and max are constants, will produce pointer with offset. (I guess this may be recognised by the compiler and optimised, but I don't have enough knowledge of the Go internals to implement such optimisations yet.)
I'm ready to either mail my prototype (with the "DO NOT SUBMIT" tag?) or go through the proposal process once the proposal repo is up and the tree opens.
I'm also not sure how to mail it. The parts I have now can be divided into three separate CLs: the spec change, teaching go/ast and go/parser about array assertions, and the implementation in cmd/compile/internal/gc and runtime.
Please use the new proposal process. That will enable a discussion and feedback based on a proposed design, rather than discussing an implementation. Thanks.
https://golang.org/cl/21008 demonstrates a different mechanism for early bounds checks: add _ = s[n]
at the start of the function. If that indirection passes the compiler does not do further bounds checks on constant indexes.
Converting a slice to an array in order to use it as a map key still seems like a valid use case.
Another valid use case, I believe, is when some function expects a pointer to an array (for example, if I want to use an existing slice of key bytes to pass to secretbox.Open).
Currently it's not possible to do this without making a copy.
Converting a slice to an array in order to use it as a map key still seems like a valid use case.
I'm not convinced that converting a slice to an array in order to use it as a map key is that useful, because we'd need to make a copy anyway, so we can already do that with:
var a [N]T
copy(a[:], slice)
for presumably much the same runtime cost as the proposed a.([N]T)
, albeit with more syntactic expense.
I take the point of this proposal to be converting a slice to an array without making a copy, in effect reversing the way [:]
can be used to convert an array to a slice. Either way you would wind up with two values: an array A, and a slice whose underlying array was A.
And so this would theoretically be useful for looking up an existing value in a map without requiring a copy.
I take the point of this proposal to be converting a slice to an array without making a copy
Not quite. The proposal is to be able to convert a slice to an array pointer without making a copy. I don't see how you could convert a slice to an array without copying, because then the array would need to refer to the contents of the slice and I don't think array values can work like that.
For example, how would it be possible to specify that in the expression:
a := slice.([N]T)
a now somehow is the same as the underlying backing array of the slice? I don't think we can.
However, for looking up an existing value in a map without a copy, we could optimize the expression:
m[*slice.(*[N]T)]
so that it doesn't take a copy (much as I believe m[string(byteSlice)]
avoids making a copy currently).
I suppose that we could then make slice.([N]T)
syntactic sugar for *slice.(*[N]T)
, although I'm not sure how much it would be used.
So, it's still discussed? I intuitively thought there is a method for this, getting the underlying array of a slice.
I mean, if by using slices one can modify the array, why can't you do it without? If passing a slice includes passing the array pointer, why one can't obtain that very pointer?
There's append and its friends, of course. Perhaps security problems may occur from using a single big array for slices with different purposes?
I use CGO to reuse C libraries; passing arrays back and forth as array pointers. I guess I have to directly use array rather than slices...
I see two reasonable syntaxes here to convert from s[i:j]
to *[4]int
:
s[i:j].(*[4]int)
(also permits comma-ok form)(*[4]int)(s[i:j][:4])
The difference, of course, is that version 1 permits the type assertion to fail. In version 2 we only permit constant indexes, and require that the difference in indexes match the array length, so that the conversion can not fail.
The problem with case 1 is that if v
is an interface type, then v.(*[4]int)
starts to look slightly ambiguous, in that it might mean a type assertion of v
to *[4]int
or it might mean a type assertion of v
to a slice of four integers that is then converted to an array pointer.
We also have to consider named array types. Is this OK?
type A [4]int
var s = (*A)[]int{1, 2, 3, 4} // Is an explicit slice expression required here? Probably not.
@griesemer and I prefer option 2 above. We require constant indexes in a slice expression, or a slice composite literal with a known number of elements.
I'm thrilled to see this getting attention again.
One issue with option 2 is that there's a long tail of ways to write code in which the slice is "obviously" big enough. For example:
x := s[i:j]
if len(x) >= 4 {
a := (*[4]int)(x)
// use a
}
(Also worth noting, to spare future readers from having to wonder why, just s[i:i+4]
is not enough, because i+4
can overflow...at least unless Go 2 also brings arbitrary precision ints.)
Another way:
a := (*[4]int)(s[i:j][1:5:5])
Another:
x := s[i:j]
_ = x[3]
a := (*[4]int)(x)
// use a
And of course, clever compilers can prove much more. If we go with option 2, it'd be nice to have a somewhat principled understanding of where to draw the line.
The problem with case 1 is that if v is an interface type, then v.([4]int) starts to look slightly ambiguous, in that it might mean a type assertion of v to [4]int or it might mean a type assertion of v to a slice of four integers that is then converted to an array pointer.
I have to say, this seems pretty unambiguous to me. Interfaces store a concrete type. Type assertions retrieve values of that concrete type. Thus the first reading is the correct one.
The big advantage, to my mind, of option 1, is the comma-ok form. Without comma-ok and with option 2 requiring a slice with constant indices, I worry we'll see a lot of code like:
x := s[i:j]
if len(x) >= 4 {
a := (*[4]int)(x[:4])
// use a
} else {
// handle insufficient size
}
This is a lot more ceremony than:
if a, ok := s[i:j].(*[4]int); ok {
// use a
} else {
// handle insufficient size
}
Another way of putting this is: Option 2 blends a lot more with the language, by finding a clever mechanism to write array pointer extraction as a conversion by ensuring that that conversion cannot fail. But from the perspective of the programmer, the conversion can effectively fail, so it'd be nicer to be able to just express that directly.
Note that allowing explicit length checks to "count" for option 2 fixes this problem, but at the cost of (probably unacceptable) spec complexity.
Here's an option 3: Introduce the notion of a conversion that can fail. It works like type assertions: If used in any but a comma-ok context, it panics on failure.
a, ok := (*[4]int)(s[i:j])
Then you keep the clean conversion syntax, avoid the [:4]
ceremony, avoid any perceived ambiguity with type assertions, and keep comma-ok support. Extend comma-ok support to all conversions; it just so happens that only this particular conversion fails. This might also turn out to be helpful in other places, e.g. if Go 2 has arbitrary precision integers: You might want int64(i)
to fail if int i
is out of range.
I'm not really sure which option is best, just want to stir the pot a bit.
We also have to consider named array types. Is this OK?
Why would it not be?
@josharian Regarding option 1: Given a variable x
of interface type where x
holds the concrete value &[]int{1, 2, 3, 4}
(a pointer to a slice []int
of length 4), currently the type assertion x.(*[]int)
will succeed. As you say ("the first reading is the correct one"), x.(*[4]int)
is a type assertion and thus succeeds if the type of the concrete value stored in x
is *[4]int
. It shouldn't succeed if there's a pointer to a slice of length 4 in there (otherwise, how would one be able to type switch between a *[4]int
and a *[]int
concrete value in x
?). So using the existing type assertion notation seems not workable at all. What am I missing?
For option 1, we could define a.(T)
to be a standard type assertion if a
is statically an interface type, and one of these new casts only if a
is statically a concrete type.
You'd have to do i.([]int).(*[4]int)
if you had an interface with a []int
in it, and wanted a *[4]int
out. A direct i.(*[4]int)
would not work.
Thanks, Keith. That was exactly what I had in mind.
The suggestions above from @randall77 and @josharian was what I had in mind in comment 3 above.
But on reflection, I quite like option 2 because it keeps .()
specific to interfaces, and keeps the usual semantics for slicing; it's a little more verbose but I don't think this will be a particularly common operation.
However, requiring an expression of the form (*[n]T)(slice[n])
seems unusual from the p.o.v. of language design. I can't think of any other place where Go requires a particular form of expression as an operand. In Go expressions, you can always rewrite an statement of the form expression(x)
to t := x; expression(t)
but this suggestion would break that invariant, because I think it would be hard to write a rule that specifically allows:
y := x[0:4]
a := (*[4]int)(y)
but not:
i := 4
y := x[0:i]
a := (*[4]int)(y)
Another possibility is just to let the type conversion panic if the slice bounds are exceeded, and allow it to work with any compatible array type. That is, the type conversion operation implies the slice too. I know that no type conversions can currently panic, but this particular case seems to me like it might be intuitively OK, and to my mind it's probably better than requiring a slice expression as an argument.
I think I'd prefer that.
@josharian @ianlancetaylor I don't understand why (*[4]int)(s[i:i+4])
is not okay, but (*[4]int)(s[i:j][:4])
is. It's true that i+4
can overflow, and so the index is out of bounds, but it's also true that (*[4]int)(s[:4])
can panic because the slice has less than length 4. Why is the panic "mid-conversion" okay in the later case but not the former? Both conversions work if and only if the indexes are in-bounds.
@btracey The issue is that currently conversion expressions never panic. I see your point about [i:i+4]
but let's consider instead [i:j]
. Permitting (*[4]int)(s[i:j])
would imply that conversion expressions can panic. So the simplest approach seems to be: we require constant indexes.
...or we could just say that conversion expressions are permitted to panic in this case. The general guideline for the comma-ok form is that it gives programs a way to test a condition that can not be tested in some other way. But in this case it's easy for a program to test whether the conversion will succeed, so there is no real need for a comma-ok form, so arguably it's not so bad to just panic if the length is wrong.
Along those lines, @josharian, I have to say that despite what I just wrote I am not really worried about pushing the bounds test into the surrounding code as in
if len(x) >= 4 { /* convert */ } else { /* oh no */ }
I think that in practice anybody who wants to use this conversion is going to know ahead of time that their slice is the right size.
...or we could just say that conversion expressions are permitted to panic in this case.
It is worth noting that some tools would need updating if we do this. For example, the bool vet check assumes that conversions have no side-effects. There are similar assumptions in the compiler. (On the other hand, implementing this in the compiler seems very straightforward, which is a positive sign.)
That said, this is currently my favorite of the options.
Whatever happened to the originally proposed form?
arrayPointer, ok := slice.[N]
.(T)
, leaving it for interfaces only.T(v)
conversion may panic.Yes, it's a new syntax, but it has a new semantic behind it as well. T(v)
panicking sounds like bad news.
I think it's hard to justify introducing new syntax for this quite unusual case.
Here is a new Idea for syntax of converting to an array slice[1..2]
is the underlying array value with 2 elements. Wanting a pointer one can use &slice[1..2]
. If the slicing does not work it throws a panic like slice[1:2]
does.
...or we could just say that conversion expressions are permitted to panic in this case. [...] But in this case it's easy for a program to test whether the conversion will succeed, so there is no real need for a comma-ok form...
Another option is to require a comma-ok when converting a slice to an array pointer. Tools may still need updating to understand the notion of a comma-ok conversion, but those failures should be more obvious, unlike quietly removing the invariant that conversions cannot panic.
And yet another option that I don't think we've explicitly discussed yet is to add API somewhere to do this, perhaps package reflect
. After all, you can do this with unsafe
now. Here's a very rough cut to start discussion:
package reflect
// ArrayPtr returns a Value containing a pointer to an array with length n and the same element type as v.
// The returned pointer points to the first element of v.
// ArrayPtr panics if v is not of type Slice, if n <= 0, or if len(v) < n.
// TODO: allow n == 0 and point to runtime.zerobase?
func (v Value) ArrayPtr(n int) Value
Sample usage:
s := make([]byte, 4)
p := reflect.ValueOf(s).ArrayPointer(4).Interface().(*[4]byte)
I don't think I'd support adding this just to the reflect package - this capability is likely to be most useful in very low level code, and reflect seems inappropriately heavyweight to me.
I guess it's just about possible that the compiler could inline the above statement into a no-op, but it seems a big ask, and it would be nice to be able to use the construct in places where an import of reflect is not allowed, or with less sophisticated compilers.
Yet another option here: Use the conversion syntax, and if the slice has insufficient capacity, convert to a nil pointer. Now the conversion is idempotent and free of side effects again, and cannot fail; it just returns the zero value in some circumstances.
s := make([]byte, 4)
p2 := (*[2]byte)(s) // &p2[0] == &s[0]
p8 := (*[8]byte)(s) // p8 == nil
Change https://golang.org/cl/216424 mentions this issue: cmd/compile: allow conversion from slice to array ptr
My latest conversion proposal seemed kind of promising to me, so I implemented it to see whether that would uncover any surprises. It was pretty smooth sailing, all in all. https://golang.org/cl/216424 for anyone who wants to play with it.
To try to summarize, there are a few different things we could imagine writing for converting a slice to a fixed-sized array.
Given var x []int and we want to have a *[4]int
named y
addressing the first 4 elements of x
, we could do any of the following. (And maybe [4]int
is a name like Vec
because you have some kind of assembly that works on 4-int pieces. It's worth seeing what that looks like too.)
Today, without any of this, you use: y := (*[4]int)(unsafe.Pointer(&x[0]))
and you don't get any check that x
has at least 4 elements.
y := (*[4]int)(x)
, where if x
is too small, the conversion panics. This would be the first conversion that panics, which might not be something we want to introduce.
(With Vec
, this would be y := (*Vec)(x)
.)
y := (*[4]int)(x)
, where if x
is too small, the conversion evaluates to nil
. This would be a little strange, since there's no way that nil
can be argued to be a representation of x
. (We already have one strange conversion, namely float to int, but maybe best not to do another.).
(With Vec
this would be y := (*Vec)(x)
.)
y := (*[4]int)(x[0:4])
. Adding the slice operations explicitly makes the possible panic easier to see, and puts it in the slice operation, not the conversion, But this has the problem that the compiler has to figure out that the first 4 is equal to the second 4 minus the 0. If you have instead (*[4]int)(x[i:i+4])
or (*[4]int)(x[i:j])
this gets trickier, both for wraparound concerns and also just not being able to analyze i
vs j
. We don't want type-checking a program to require significant analysis, especially analysis that might differ from compiler to compiler.
(With Vec
this would be y := (*Vec)(x[0:len(Vec{})])
.)
y := (*[4]int)(x[0:4])
, where the slice bounds are required to be constants that subtract to the array size. This would mean writing things like (*[4]int)(x[i:][0:4])
, but that's not much different from (*[4]int)(x[i:])
in the earlier versions.
(With Vec
this would be y := (*Vec)(x[i:][0:len(Vec{})])
.)
y := x.(*[4]int)
and y, ok := x.(*[4]int)
. This is attractive because it makes clear that the operation might fail, and it lets users check. But what if we have an interface w
holding a []int
? Does w.(*[4]int)
work, or do you have to say w.([]int).(*[4]int)
? That's probably a bit too overloaded.
(With Vec
this would be y := x.(*Vec)
and w.([]int).(*Vec)
. The fact of the slice to array conversion is particularly hard to see in this one.)
It's also possible that while this definitely does come up, it doesn't come up enough to be worth doing (meaning option 1 above). We could try grepping the public Go code for any conversions of this form.
A builtin was suggested earlier in this thread†. I don't really understand why it was dismissed. It seems clearer than any of the proposed syntax, especially since this would be a fairly rare thing to do and it'd be easier to look up a name.
† https://github.com/golang/go/issues/395#issuecomment-66049552