golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.06k stars 17.55k forks source link

spec: use (*[4]int)(x) to convert slice x into array pointer #395

Closed rogpeppe closed 3 years ago

rogpeppe commented 14 years ago
Currently, it's possible to convert from an array or array pointer to a slice, but
there's no way of reversing 
this.

A possible syntax could be similar to the current notation for type assertions:

ArrayAssertion  = "." "[" Expression ":" Expression
"]" .

where the operands either side of the ":" are constant expressions.

One motivation for doing this is that using an array pointer allows the compiler to
range check constant 
indices at compile time.

a function like this:

func foo(a []int) int
{
    return a[0] + a[1] + a[2] + a[3];
}

could be turned into:

func foo(a []int) int
{
    b := a.[0:4];
    return b[0] + b[1] + b[2] + b[3];
}

allowing the compiler to check all the bounds once only and give compile-time errors
about out of range 
indices.
robpike commented 14 years ago

Comment 1:

Labels changed: added languagechange.

Status changed to Thinking.

rsc commented 14 years ago

Comment 2:

I think this functionality would be nice.
Personally I would rather not assume
that the compiler can subtract arbitrary
expressions (as in b := a.[0:4]) but instead
say explicitly what type I want:
b := (*[4]int)(a[0:4])
The argument against this is that we hoped
introducing x[:] would let us get rid of the
implicit conversion from *[4]int to slice.
Maybe it still does, but we allow the explicit one.
There are certainly compelling cases (mostly
in low-level things like jpeg or sha1 block
processing) where converting a slice to *[N]int
for some N would eliminate many bounds checks
for cheap.

Owner changed to r...@golang.org.

rogpeppe commented 14 years ago

Comment 3:

> b := (*[4]int)(a[0:4])
i thought about this. the problem is that it looks like other type conversions, and
currently no 
type conversion can fail at runtime.
actually, i can't see any reason why we couldn't just use the normal type coercion
syntax:
b := a[0:4]).(*[4]int)
rsc commented 14 years ago

Comment 4:

Yes, that's a disadvantage of (*[4]int)(a[0:4]).
A disadvantage of a[0:4].(*[4]int) is that it takes over
syntax currently reserved for interface values.  At one
point conversion syntax and type guard syntax was 
interchangeable.  It clarified things quite a bit
to require that in x.(T), x must be an interface value
and that in T(x), the conversion must be statically
guaranteed to succeed.
Unfortunately, this particular conversion doesn't fit 
into either of those categories.
We've got enough going on that's it going to be a while
before we do anything with this.
rogpeppe commented 13 years ago

Comment 5:

I just encountered a nice example of when this functionality might be useful. To quote
from a reddit user on why Go "needs" pointer arithmetic: 
"One well-used example is making classes as small as possible for tree nodes or linked
list nodes so you can cram as many of them into L1 cache lines as possible. This is done
by each node having a single pointer to a left sub-node, and the right sub-node being
accessed by the pointer to the left sub-node + 1. This saves the 8-bytes for the
right-node pointer. To do this you have to pre-allocate all the nodes in a vector or
array so they're laid out in memory sequentially, but it's worth it when you need it for
performance. (This also has the added benefit of the prefetchers being able to help
things along performance-wise - at least in the linked list case).''
You can *almost* do this in Go with 
   type node struct {
      value int
      children *[2]node
   }
except that there's no way of getting a *[2]node from the underlying slice.
gopherbot commented 13 years ago

Comment 6 by nmessenger:

If neither syntax is ideal, perhaps a new unslice builtin?
    ary, ok := unslice([n]T, slc)
Though should ary have type [n]T or *[n]T? If n is large and the unslice fails, a large
zeroed array might not be ideal. Anything wrong with this that I'm not seeing? Well,
besides it being another new builtin.
gopherbot commented 13 years ago

Comment 7 by robpike:

This is a big deal because ary would not have static type.
rsc commented 13 years ago

Comment 8:

If we were going to do it - which is far from even up for debate - I think
the syntax x.(*[10]int) works well (x has type []int).  You can't .(T) a slice
type right now, so it would not be overloading anything, and like an
interface type assertion it can fail or be checked at run time.
You can even think of it as []int containing a *[10]int the same way
an interface value holds a concrete type, and you're extracting the
underlying array pointer.
That said, I don't think this is important enough to worry about now.
There is enough else going on.
rsc commented 12 years ago

Comment 9:

Labels changed: added priority-later.

mdempsky commented 11 years ago

Comment 11:

Since this is something that's been bugging me lately too, I thought I'd add a few
random thoughts I had that don't seem to have been mentioned:
Allowing conversions from slices to array-pointers means pointers can now refer to
partially overlapping objects.  I don't believe that's currently possible in the
language.  Slices can already overlap though, so it's not a big change overall.
Like #8 says, []T is sort-of an interface type for *[N]T, so type assertions are
arguably suitable syntax.  Except that cap(x.(*[N]T)) might give a different value than
cap(x), which isn't true for other interfaces/implementor-type relations.  Seems like an
open question whether this inconsistency is worth accepting into the language, and
really since there's already a way to convert a *[N]T into a []T, just the ability to
turn a []T into a *[N]T is the relevant missing feature.
It would be nice if an expression like x[e1:e2] could actually have a static type of
*[e2-e1]T (assuming e1 and e2 are constant expressions), then you could write something
like *dst[16:24] = *src[136:144] and the compiler can verify that the array bounds match
up.  Unfortunately, the expression can't actually be x[e1:e2] since existing code might
rely on cap(x[e1:e2]) == cap(x)-e1, and that would be a backwards incompatible change. 
The x.[e1:e2] syntax suggested originally would solve this issue.
If you want a range like x.[n:n+4], instead of requiring the language to recognize this
pattern somehow, x[n:].[:4] is equivalent and has static indices at the expense of
clunkier notation.  A short-hand notation like x.[n:+4] might be nice to indicate that 4
is a length not an end position, but not strictly necessary and complicates the
language.  (Also +4 here is technically ambiguous here whether it's length 4 or end
position "+4", so again some new notation would be necessary.)
gopherbot commented 11 years ago

Comment 12 by peter.waller:

I just want to note that we have a use case for this at go-gl. More information:
https://github.com/go-gl/gl/pull/111
The underlying OpenGL API only accepts the equivalent of, e.g, a *[4]float32, so it is
nice to have this in the type system on our side. OTOH, a consumer of this API might be
holding a []float32 they want to pass to us. So it would be great to find a solution to
this, as the current solutions are a bit messy or require the use of unsafe.
rsc commented 10 years ago

Comment 13:

Labels changed: added go1.3maybe.

rsc commented 10 years ago

Comment 14:

Labels changed: added release-none, removed go1.3maybe.

rsc commented 10 years ago

Comment 15:

Labels changed: added repo-main.

ncw commented 10 years ago

Comment 16:

An alternative which occurred to me was in a function which only indexes a slice with
constants values (eg the JPEG routines or unrolled FFTs which are what I'm working on)
and the slice doesn't change the compiler could bounds check the slice just once at the
start of the function with min(constants) and max(constants).
This would achieve the removal of the bounds checking without a language change.  It
wouldn't allow the compiler to do range checking at compile time though.
gopherbot commented 10 years ago

Comment 17 by matthieu.riou:

Another use case is to be able to use a small slice of known length as a map key. The Go
API uses slices heavily even though the same length is expected. Being able to get the
underlying array would allow usage as map keys.
Two examples I've run into recently are hashes (SHA-256) and IP addresses (as 16 byte
slices). It seems rather wasteful to have to copy them or transform them to strings to
have to use them as map keys.
nerdatmath commented 9 years ago

FWIW, something similar to unslice() above can be implemented with reflect and unsafe. Despite being implemented in terms of unsafe, I believe unslice itself is safe. I don't know whether it violates any assumptions made by the GC, however.

http://play.golang.org/p/DixtgwxXUH

daviddengcn commented 9 years ago

Actually I believe the compiler could easily be smart enough to make a single range check for statement like this:

return a[0] + a[1] + a[2] + a[3]
chalonga commented 9 years ago

Is there a good way to do this currently?

For example if one function gives me a slice as output and I need to use that output in another function that wants a fixed array as input. What is the best way to coerce the slice into an array that is the current size of the slice and containing it's current members?

ianlancetaylor commented 9 years ago

Currently there is no safe way to convert from a slice type to an array type (that is the point of this issue).

You can do it using unsafe by writing code like

(*[10]byte)(unsafe.Pointer(&b[0]))
daviddengcn commented 9 years ago

@chalonga or you can copy the data in the slice into a new array.

ainar-g commented 9 years ago

Is anyone still interested in this? I have a prototype implementation. I've simplified the syntax to

ArrayAssertion = "." "[" Expression "]"

If needed, s[min:max].[max-min], where min and max are constants, will produce pointer with offset. (I guess this may be recognised by the compiler and optimised, but I don't have enough knowledge of the Go internals to implement such optimisations yet.)

I'm ready to either mail my prototype (with the "DO NOT SUBMIT" tag?) or go through the proposal process once the proposal repo is up and the tree opens.

I'm also not sure how to mail it. The parts I have now can be divided into three separate CLs: the spec change, teaching go/ast and go/parser about array assertions, and the implementation in cmd/compile/internal/gc and runtime.

griesemer commented 9 years ago

Please use the new proposal process. That will enable a discussion and feedback based on a proposed design, rather than discussing an implementation. Thanks.

ianlancetaylor commented 6 years ago

https://golang.org/cl/21008 demonstrates a different mechanism for early bounds checks: add _ = s[n] at the start of the function. If that indirection passes the compiler does not do further bounds checks on constant indexes.

ianlancetaylor commented 6 years ago

Converting a slice to an array in order to use it as a map key still seems like a valid use case.

rogpeppe commented 6 years ago

Another valid use case, I believe, is when some function expects a pointer to an array (for example, if I want to use an existing slice of key bytes to pass to secretbox.Open).

Currently it's not possible to do this without making a copy.

Converting a slice to an array in order to use it as a map key still seems like a valid use case.

I'm not convinced that converting a slice to an array in order to use it as a map key is that useful, because we'd need to make a copy anyway, so we can already do that with:

var a [N]T
copy(a[:], slice)

for presumably much the same runtime cost as the proposed a.([N]T), albeit with more syntactic expense.

ianlancetaylor commented 6 years ago

I take the point of this proposal to be converting a slice to an array without making a copy, in effect reversing the way [:] can be used to convert an array to a slice. Either way you would wind up with two values: an array A, and a slice whose underlying array was A.

And so this would theoretically be useful for looking up an existing value in a map without requiring a copy.

rogpeppe commented 6 years ago

I take the point of this proposal to be converting a slice to an array without making a copy

Not quite. The proposal is to be able to convert a slice to an array pointer without making a copy. I don't see how you could convert a slice to an array without copying, because then the array would need to refer to the contents of the slice and I don't think array values can work like that.

For example, how would it be possible to specify that in the expression:

a := slice.([N]T)

a now somehow is the same as the underlying backing array of the slice? I don't think we can.

However, for looking up an existing value in a map without a copy, we could optimize the expression:

m[*slice.(*[N]T)]

so that it doesn't take a copy (much as I believe m[string(byteSlice)] avoids making a copy currently).

I suppose that we could then make slice.([N]T) syntactic sugar for *slice.(*[N]T), although I'm not sure how much it would be used.

Elfansoer commented 6 years ago

So, it's still discussed? I intuitively thought there is a method for this, getting the underlying array of a slice.

I mean, if by using slices one can modify the array, why can't you do it without? If passing a slice includes passing the array pointer, why one can't obtain that very pointer?

There's append and its friends, of course. Perhaps security problems may occur from using a single big array for slices with different purposes?

I use CGO to reuse C libraries; passing arrays back and forth as array pointers. I guess I have to directly use array rather than slices...

ianlancetaylor commented 6 years ago

I see two reasonable syntaxes here to convert from s[i:j] to *[4]int:

  1. s[i:j].(*[4]int) (also permits comma-ok form)
  2. (*[4]int)(s[i:j][:4])

The difference, of course, is that version 1 permits the type assertion to fail. In version 2 we only permit constant indexes, and require that the difference in indexes match the array length, so that the conversion can not fail.

The problem with case 1 is that if v is an interface type, then v.(*[4]int) starts to look slightly ambiguous, in that it might mean a type assertion of v to *[4]int or it might mean a type assertion of v to a slice of four integers that is then converted to an array pointer.

We also have to consider named array types. Is this OK?

type A [4]int
var s = (*A)[]int{1, 2, 3, 4} // Is an explicit slice expression required here?  Probably not.
ianlancetaylor commented 6 years ago

@griesemer and I prefer option 2 above. We require constant indexes in a slice expression, or a slice composite literal with a known number of elements.

josharian commented 6 years ago

I'm thrilled to see this getting attention again.

One issue with option 2 is that there's a long tail of ways to write code in which the slice is "obviously" big enough. For example:

x := s[i:j]
if len(x) >= 4 {
  a := (*[4]int)(x)
  // use a
}

(Also worth noting, to spare future readers from having to wonder why, just s[i:i+4] is not enough, because i+4 can overflow...at least unless Go 2 also brings arbitrary precision ints.)

Another way:

a := (*[4]int)(s[i:j][1:5:5])

Another:

x := s[i:j]
_ = x[3]
a := (*[4]int)(x)
// use a

And of course, clever compilers can prove much more. If we go with option 2, it'd be nice to have a somewhat principled understanding of where to draw the line.

The problem with case 1 is that if v is an interface type, then v.([4]int) starts to look slightly ambiguous, in that it might mean a type assertion of v to [4]int or it might mean a type assertion of v to a slice of four integers that is then converted to an array pointer.

I have to say, this seems pretty unambiguous to me. Interfaces store a concrete type. Type assertions retrieve values of that concrete type. Thus the first reading is the correct one.

The big advantage, to my mind, of option 1, is the comma-ok form. Without comma-ok and with option 2 requiring a slice with constant indices, I worry we'll see a lot of code like:

x := s[i:j]
if len(x) >= 4 {
  a := (*[4]int)(x[:4])
  // use a
} else {
  // handle insufficient size
}

This is a lot more ceremony than:

if a, ok := s[i:j].(*[4]int); ok {
  // use a
} else {
  // handle insufficient size
}

Another way of putting this is: Option 2 blends a lot more with the language, by finding a clever mechanism to write array pointer extraction as a conversion by ensuring that that conversion cannot fail. But from the perspective of the programmer, the conversion can effectively fail, so it'd be nicer to be able to just express that directly.

Note that allowing explicit length checks to "count" for option 2 fixes this problem, but at the cost of (probably unacceptable) spec complexity.

Here's an option 3: Introduce the notion of a conversion that can fail. It works like type assertions: If used in any but a comma-ok context, it panics on failure.

a, ok := (*[4]int)(s[i:j])

Then you keep the clean conversion syntax, avoid the [:4] ceremony, avoid any perceived ambiguity with type assertions, and keep comma-ok support. Extend comma-ok support to all conversions; it just so happens that only this particular conversion fails. This might also turn out to be helpful in other places, e.g. if Go 2 has arbitrary precision integers: You might want int64(i) to fail if int i is out of range.

I'm not really sure which option is best, just want to stir the pot a bit.

We also have to consider named array types. Is this OK?

Why would it not be?

griesemer commented 6 years ago

@josharian Regarding option 1: Given a variable x of interface type where x holds the concrete value &[]int{1, 2, 3, 4} (a pointer to a slice []int of length 4), currently the type assertion x.(*[]int) will succeed. As you say ("the first reading is the correct one"), x.(*[4]int) is a type assertion and thus succeeds if the type of the concrete value stored in x is *[4]int. It shouldn't succeed if there's a pointer to a slice of length 4 in there (otherwise, how would one be able to type switch between a *[4]int and a *[]int concrete value in x ?). So using the existing type assertion notation seems not workable at all. What am I missing?

randall77 commented 6 years ago

For option 1, we could define a.(T) to be a standard type assertion if a is statically an interface type, and one of these new casts only if a is statically a concrete type.

You'd have to do i.([]int).(*[4]int) if you had an interface with a []int in it, and wanted a *[4]int out. A direct i.(*[4]int) would not work.

josharian commented 6 years ago

Thanks, Keith. That was exactly what I had in mind.

rogpeppe commented 6 years ago

The suggestions above from @randall77 and @josharian was what I had in mind in comment 3 above. But on reflection, I quite like option 2 because it keeps .() specific to interfaces, and keeps the usual semantics for slicing; it's a little more verbose but I don't think this will be a particularly common operation.

However, requiring an expression of the form (*[n]T)(slice[n]) seems unusual from the p.o.v. of language design. I can't think of any other place where Go requires a particular form of expression as an operand. In Go expressions, you can always rewrite an statement of the form expression(x) to t := x; expression(t) but this suggestion would break that invariant, because I think it would be hard to write a rule that specifically allows:

y := x[0:4]
a := (*[4]int)(y)

but not:

i := 4
y := x[0:i]
a := (*[4]int)(y)

Another possibility is just to let the type conversion panic if the slice bounds are exceeded, and allow it to work with any compatible array type. That is, the type conversion operation implies the slice too. I know that no type conversions can currently panic, but this particular case seems to me like it might be intuitively OK, and to my mind it's probably better than requiring a slice expression as an argument.

I think I'd prefer that.

btracey commented 6 years ago

@josharian @ianlancetaylor I don't understand why (*[4]int)(s[i:i+4]) is not okay, but (*[4]int)(s[i:j][:4]) is. It's true that i+4 can overflow, and so the index is out of bounds, but it's also true that (*[4]int)(s[:4]) can panic because the slice has less than length 4. Why is the panic "mid-conversion" okay in the later case but not the former? Both conversions work if and only if the indexes are in-bounds.

ianlancetaylor commented 6 years ago

@btracey The issue is that currently conversion expressions never panic. I see your point about [i:i+4] but let's consider instead [i:j]. Permitting (*[4]int)(s[i:j]) would imply that conversion expressions can panic. So the simplest approach seems to be: we require constant indexes.

...or we could just say that conversion expressions are permitted to panic in this case. The general guideline for the comma-ok form is that it gives programs a way to test a condition that can not be tested in some other way. But in this case it's easy for a program to test whether the conversion will succeed, so there is no real need for a comma-ok form, so arguably it's not so bad to just panic if the length is wrong.

Along those lines, @josharian, I have to say that despite what I just wrote I am not really worried about pushing the bounds test into the surrounding code as in

    if len(x) >= 4 { /* convert */ } else { /* oh no */ }

I think that in practice anybody who wants to use this conversion is going to know ahead of time that their slice is the right size.

josharian commented 6 years ago

...or we could just say that conversion expressions are permitted to panic in this case.

It is worth noting that some tools would need updating if we do this. For example, the bool vet check assumes that conversions have no side-effects. There are similar assumptions in the compiler. (On the other hand, implementing this in the compiler seems very straightforward, which is a positive sign.)

That said, this is currently my favorite of the options.

ainar-g commented 6 years ago

Whatever happened to the originally proposed form?

arrayPointer, ok := slice.[N]

Yes, it's a new syntax, but it has a new semantic behind it as well. T(v) panicking sounds like bad news.

ianlancetaylor commented 6 years ago

I think it's hard to justify introducing new syntax for this quite unusual case.

martisch commented 5 years ago

Here is a new Idea for syntax of converting to an array slice[1..2] is the underlying array value with 2 elements. Wanting a pointer one can use &slice[1..2]. If the slicing does not work it throws a panic like slice[1:2] does.

josharian commented 5 years ago

...or we could just say that conversion expressions are permitted to panic in this case. [...] But in this case it's easy for a program to test whether the conversion will succeed, so there is no real need for a comma-ok form...

Another option is to require a comma-ok when converting a slice to an array pointer. Tools may still need updating to understand the notion of a comma-ok conversion, but those failures should be more obvious, unlike quietly removing the invariant that conversions cannot panic.

josharian commented 5 years ago

And yet another option that I don't think we've explicitly discussed yet is to add API somewhere to do this, perhaps package reflect. After all, you can do this with unsafe now. Here's a very rough cut to start discussion:

package reflect

// ArrayPtr returns a Value containing a pointer to an array with length n and the same element type as v.
// The returned pointer points to the first element of v.
// ArrayPtr panics if v is not of type Slice, if n <= 0, or if len(v) < n.
// TODO: allow n == 0 and point to runtime.zerobase?
func (v Value) ArrayPtr(n int) Value

Sample usage:

   s := make([]byte, 4)
   p := reflect.ValueOf(s).ArrayPointer(4).Interface().(*[4]byte)
rogpeppe commented 5 years ago

I don't think I'd support adding this just to the reflect package - this capability is likely to be most useful in very low level code, and reflect seems inappropriately heavyweight to me.

I guess it's just about possible that the compiler could inline the above statement into a no-op, but it seems a big ask, and it would be nice to be able to use the construct in places where an import of reflect is not allowed, or with less sophisticated compilers.

josharian commented 4 years ago

Yet another option here: Use the conversion syntax, and if the slice has insufficient capacity, convert to a nil pointer. Now the conversion is idempotent and free of side effects again, and cannot fail; it just returns the zero value in some circumstances.

s := make([]byte, 4)
p2 := (*[2]byte)(s) // &p2[0] == &s[0]
p8 := (*[8]byte)(s) // p8 == nil
gopherbot commented 4 years ago

Change https://golang.org/cl/216424 mentions this issue: cmd/compile: allow conversion from slice to array ptr

josharian commented 4 years ago

My latest conversion proposal seemed kind of promising to me, so I implemented it to see whether that would uncover any surprises. It was pretty smooth sailing, all in all. https://golang.org/cl/216424 for anyone who wants to play with it.

rsc commented 4 years ago

To try to summarize, there are a few different things we could imagine writing for converting a slice to a fixed-sized array. Given var x []int and we want to have a *[4]int named y addressing the first 4 elements of x, we could do any of the following. (And maybe [4]int is a name like Vec because you have some kind of assembly that works on 4-int pieces. It's worth seeing what that looks like too.)

  1. Today, without any of this, you use: y := (*[4]int)(unsafe.Pointer(&x[0])) and you don't get any check that x has at least 4 elements.

  2. y := (*[4]int)(x), where if x is too small, the conversion panics. This would be the first conversion that panics, which might not be something we want to introduce.

    (With Vec, this would be y := (*Vec)(x).)

  3. y := (*[4]int)(x), where if x is too small, the conversion evaluates to nil. This would be a little strange, since there's no way that nil can be argued to be a representation of x. (We already have one strange conversion, namely float to int, but maybe best not to do another.).

    (With Vec this would be y := (*Vec)(x).)

    1. y := (*[4]int)(x[0:4]). Adding the slice operations explicitly makes the possible panic easier to see, and puts it in the slice operation, not the conversion, But this has the problem that the compiler has to figure out that the first 4 is equal to the second 4 minus the 0. If you have instead (*[4]int)(x[i:i+4]) or (*[4]int)(x[i:j]) this gets trickier, both for wraparound concerns and also just not being able to analyze i vs j. We don't want type-checking a program to require significant analysis, especially analysis that might differ from compiler to compiler.

      (With Vec this would be y := (*Vec)(x[0:len(Vec{})]).)

  4. y := (*[4]int)(x[0:4]), where the slice bounds are required to be constants that subtract to the array size. This would mean writing things like (*[4]int)(x[i:][0:4]), but that's not much different from (*[4]int)(x[i:]) in the earlier versions.

    (With Vec this would be y := (*Vec)(x[i:][0:len(Vec{})]).)

  5. y := x.(*[4]int) and y, ok := x.(*[4]int). This is attractive because it makes clear that the operation might fail, and it lets users check. But what if we have an interface w holding a []int? Does w.(*[4]int) work, or do you have to say w.([]int).(*[4]int)? That's probably a bit too overloaded.

    (With Vec this would be y := x.(*Vec) and w.([]int).(*Vec). The fact of the slice to array conversion is particularly hard to see in this one.)

It's also possible that while this definitely does come up, it doesn't come up enough to be worth doing (meaning option 1 above). We could try grepping the public Go code for any conversions of this form.

jimmyfrasche commented 4 years ago

A builtin was suggested earlier in this thread†. I don't really understand why it was dismissed. It seems clearer than any of the proposed syntax, especially since this would be a fairly rare thing to do and it'd be easier to look up a name.

https://github.com/golang/go/issues/395#issuecomment-66049552