golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.06k stars 17.55k forks source link

spec: use (*[4]int)(x) to convert slice x into array pointer #395

Closed rogpeppe closed 3 years ago

rogpeppe commented 14 years ago
Currently, it's possible to convert from an array or array pointer to a slice, but
there's no way of reversing 
this.

A possible syntax could be similar to the current notation for type assertions:

ArrayAssertion  = "." "[" Expression ":" Expression
"]" .

where the operands either side of the ":" are constant expressions.

One motivation for doing this is that using an array pointer allows the compiler to
range check constant 
indices at compile time.

a function like this:

func foo(a []int) int
{
    return a[0] + a[1] + a[2] + a[3];
}

could be turned into:

func foo(a []int) int
{
    b := a.[0:4];
    return b[0] + b[1] + b[2] + b[3];
}

allowing the compiler to check all the bounds once only and give compile-time errors
about out of range 
indices.
mdempsky commented 4 years ago

@rsc

Does w.(*[4]int) work, or do you have to say w.([]int).(*[4]int)? That's probably a bit too overloaded.

I would say, no, interface{}(make([]int, 4)).(*[4]int) should continue to panic (i.e., not work). IMO, asserting an interface to a concrete type is a different operation than asserting a slice to an array.

With Vec this would be y := x.(*Vec) and w.([]int).(*Vec). The fact of the slice to array conversion is particularly hard to see in this one.

Yeah, x.(*Vec) is hard to tell what's going on without knowing x's type. Though w.([]int).(*Vec) seems unambiguous: if this type checks, then Vec must be an array type.

@jimmyfrasche:

A builtin was suggested earlier in this thread†. I don't really understand why it was dismissed. It seems clearer than any of the proposed syntax, especially since this would be a fairly rare thing to do and it'd be easier to look up a name.

I didn't like the suggestion of a builtin at first; but the more I think about it, the better these arguments sound to me.

I am curious if unslice(N, x) or unslice(x, N) (where N is an integer constant) would be more convenient than unslice([N]T, x). I suppose it depends on how common users are using [N]T directly vs a defined type like Vec. (Users could still write (*Vec)(unslice(x, len(Vec{}))), but it's definitely clumsy.)

It would also be possible for unslice to accept either a length or type argument, and to do the appropriate thing in either case.

(Also, just as a reminder: currently Go's production rule for Arguments only allow type arguments in the first position. But I think there's nothing precluding unslice(x, [4]int) if we really wanted to support that. E.g., gofmt already handles it fine.)

jimmyfrasche commented 4 years ago

If Vec is type Vec [4]int I would assume that unslice(Vec, s) would work.

mdempsky commented 4 years ago

@jimmyfrasche Yes, if unslice accepts a type argument, then I expect unslice(Vec, s) should work.

My suggestion though was to allow unslice to accept a length argument. And in case it only accepts a length argument (i.e., does not accept a type argument), then I was pointing out there was still a way to convert to *Vec.

That is, this axis of unslice's design space has three options:

  1. func unslice(ArrayType, s []Type) *ArrayType as originally suggested. So you would write unslice([4]int, s) or unslice(Vec, s).

  2. func unslice(N IntegerType, s []Type) *[N]Type as an alternative. So you would write unslice(4, s) or (*Vec)(unslice(len(Vec{}), s)).

  3. Allow unslice to be used as either of the above two options. So you would write unslice(4, s) or unslice(Vec, s) (or either of the other two longer forms, if preferable for some reason).

(Order of arguments is another axis I mentioned in my comment.)

rogpeppe commented 4 years ago

I quite like the idea of a builtin, but I'm not keen on having a builtin that takes both a type and a value in the same parameter list. AFAIK there are no other primitives that do that, and I don't think it's a good precedent to set.

I see a few possibilities. I'll use the generic type notation even though it wouldn't be possible to use an explicit type parameter prior to generics landing.

// unslice converts the first len(arrayVal) members of slice to a pointer
// to an array. A must be of type *[N]T for some N, but is otherwise
// unused. It panics if the slice isn't large enough.
// For example:
//      x := make([]int, 5)
//      y := unslice((*[4]int)(nil), x)
func unslice[type A, T](arrayVal A, slice []T) A

// unslice converts the first len(arrayVal) members of slice to a pointer
// to an array. A must be of type *[N]T for some N. It reports whether
// the conversion succeeded.
func unslice[type A, T](arrayVal A, slice []T) (A, bool)

// unslice converts the first len(*arrayValPtr) members of slice
// to a pointer to an array by setting *arrayValPtr, which must be
// of type **[N]T for some N. It reports whether the conversion
// succeeded.
// For example:
//    x := make([]int, 5)
//    var y [4]int
//    if unslice(&y, x) {
//       ...
//    }
func unslice[type T](arrayValPtr A, slice []T) bool

If/when generics land, we could also have this (could also return bool rather than panicking):

// unslice returns the first len(A{}) members of slice
// as a pointer to A. A must be of type [N]A for some N.
// It panics if the conversion fails.
// For example:
//     x := make([]int, 5)
//     y := unslice[int, *[4]int](x)
func unslice[type T, A](slice []T) *A

Bikeshed: I wonder about "arrayof" as a name (by analogy with unsafe.Sizeof).

bcmills commented 4 years ago

I'm not keen on having a builtin that takes both a type and a value in the same parameter list. AFAIK there are no other primitives that do that, and I don't think it's a good precedent to set.

make already does: its first argument is a type, and its subsequent arguments (if any) are sizes.

bcmills commented 4 years ago

y := (*[4]int)(x), where if x is too small, the conversion panics. This would be the first conversion that panics, which might not be something we want to introduce.

I would expect that if we add some kind of overflow-checked integer type (#30613 or related proposals), then converting an out-of-range value to a checked integer type would similarly cause a panic.

I'm honestly surprised (and somewhat disappointed) that converting a NaN floating-point value to an integer does not panic today (https://play.golang.org/p/hjK2FNOeJel).

bcmills commented 4 years ago

Note that this cannot be described as a generic function under the current design draft, as there is no way in the current draft to express the constraint “array of T of any length”.

bcmills commented 4 years ago

Even weakening the type signature to any *T that points to a type that is the same byte-size as the passed in slice, this does not appear to be implementable under the current generics draft due to #40301.

bcmills commented 4 years ago
  1. … what if we have an interface w holding a []int? Does w.(*[4]int) work, or do you have to say w.([]int).(*[4]int)?

It would have to be the latter. The former would be an incompatible change for interfaces that may already hold an explicit *[4]int.

ianlancetaylor commented 4 years ago

I think that we should go with the simple conversion syntax, with the understanding that it will panic if the slice is too small.

It's true that no conversion currently panics. This will be a change. But the panic will be clear, and the resolution will be clear.

In practice programs are never going to get this wrong. And when compared to a type assertion, it's worth noting that all current comma-ok forms have a particular characteristic: there is no simple way to decide whether the expression will succeed or not, other than doing the operation and checking the ok value. For the case of converting from slice to array pointer, there is already a simple and obvious way to test whether the conversion will succeed if len(slice) < len(arrayType{}). So this does not seem like a natural choice for a comma-ok form, and the only reason we are even considering it is that we already have type assertions that are sort of like conversions.

Conversions are already a complex part of the language, in that they can do heap allocation, can translate between representations, can lose parts of the input value (e.g., when converting between float and integer and when converting from string to []rune), and in general have no commonality other than changing from one type to another type. Adding a panic to all the other operations a conversion can do does not seem to me to be a bridge too far.

So let's just use the conversion syntax. It's simple and straightforward.

FiloSottile commented 4 years ago

I'm not sure it's a strong enough argument against conversions with panics, but it's going to make it harder to audit security-sensitive code for attacker-reachable panics. Anecdotally, a good number of vulnerabilities in the wild are panics in code where that counts as a Denial of Service.

Currently, when looking at some code to assess if it can be made to panic it's usually enough to look at indexing and slicing operations, and some specific arithmetic operations (divisions, shifts). The other run-time panics (nils, interface concrete types) usually don't depend on input data. If this is introduced, any conversion will require inspection, and any T(x) in isolation can't be assumed to be safe: T might be an array pointer type and x might be a slice with attacker controlled length.

Currently, "does this panic based on input?" can be answered mostly by only looking at the syntax and at the specific code. With conversions that panic, answering that will require type information from elsewhere in the program.

mdempsky commented 4 years ago

@FiloSottile Enumerating classes of expressions that can panic seems like a good task for a tool to automate.

rogpeppe commented 4 years ago

I agree with both @ianlancetaylor and @mdempsky here. A straight conversion seems good, and pointer indirections, indexing and slicing operations are already prevalent enough that an automated tool is surely a good answer here.

PS thanks @bcmills for pointing out my rookie mistake there. I'd somehow totally forgotten about make!

josharian commented 4 years ago

I concur on choosing conversion.

I am still inclined to yield nil instead of panic. Yes, it is a bit strange, but it has nice properties.

In order to use the array value, you need to dereference, so the panic on short bounds is still there. However, avoiding the panic has better ergonomics: It is easier to check p != nil than check bounds or defer+recover.

And allowing conversions to panic could break any existing analysis tools that assume that conversions don’t alter control flow.

ianlancetaylor commented 4 years ago

I'm fine with returning nil for a slice -> array pointer conversion if the slice is too small.

randall77 commented 4 years ago

I'm not a huge fan of returning nil. The problem with that is we can't report the too-small length we encountered. When the nil is dereferenced we'll get a nil pointer panic, which can't contain the too-small length. I'd rather we give a panic that says something like "invalid conversion: slice has length 5 but needs length 8".

Does anyone have an actual example of code that assumes conversions do not panic? What about x/tools/go/ssa? Would such code need to be updated anyway, to handle this new case?

jimmyfrasche commented 4 years ago

@randall77 All of the information to create a message like that is available.

a := (*[8]T)(s)
if a == nil {
  panic(fmt.Errorf("invalid conversion: slice has length %d but needs length 8", len(s)))
}
randall77 commented 4 years ago

Sure, you could code it explicitly. What I'm worried about is if someone does:

a := (*[8]T)(s)
a[3] = 7

What error do they get if len(s)==5? I'd rather the cast panic, in which case the runtime can give them the 5, as opposed to just getting a nil pointer panic on the line a[3] = 7, with no way for the runtime to know that len(s) is even relevant.

randall77 commented 4 years ago

It's also a problem if the cast and the dereference are far apart, in time or in source code. (Again, only if the cast fails and the coder didn't bother to check the result explicitly.)

networkimprov commented 4 years ago

Apologies for stating the obvious, but we already get panics for index & slice out-of-range conditions.

Isn't a conversion from incorrect slice a variation on that theme?

randall77 commented 4 years ago

I think the worry is that we know (and code analysis software assumes) that x[i] or x[i:j] might panic, but (T)(x) will not panic.

bcmills commented 4 years ago

@josharian, note that returning nil still probably wouldn't address @FiloSottile's concern, since that could provoke a new panic at arbitrarily many later points in the execution of the program. (That is, from the perspective of code auditing, returning nil for a conversion from a non-nil value is likely just as disruptive as panicking directly.)

josharian commented 4 years ago

@bcmills I’d love to get Filippo’s take on that. That’s not obvious to me. But it’s true that it might break existing nil pointer analysis tools, in the same way that panicking might break existing control flow analysis tools.

beoran commented 4 years ago

One approach which I haven't seen mentioned is to have an unslice() built in which simply does the "right thing", which is allocating a new array and copying over the available elements from the slice if the slice is too small. Remaining entries are left at the zero value. If I have a []int{1, 2} that I want to convert to a [4]int, then, [1,2,0,0] is, IMHO, the correct answer to what the conversion result should be. On the other hand the slice is big enough the copy doesn't happen and we get a pointer to the underlying array of the slice in stead.



slice := []int{}
arr := unslice([4]int, slice)
// arr now is [4]int{0,0,0,0}, but the array doesn't point into slice and is newly allocated.

slice := []int{1, 2, 3, 4, 5}
arr := unslice([4]int, slice)
// arr now is [4]int{1,2,3,4} and points into slice
rsc commented 4 years ago

The security argument here seems pretty weak. It's already the case that x.f can panic, as can x.m(), and lots of other expressions. No one is actually mentally checking every single one of those. That's a job for tools.

Keith's point about printing the short length is a great reason to panic instead of silently coercing to nil. I've really enjoyed the new index and slice panics that show the out-of-range index and length. It would be very helpful, in the rare case when a conversion does fail, to see that without having to add the kind of print that @jimmyfrasche showed, preemptively at every conversion site.

From the reactions to Ian's comment above and the discussion after, it sounds like we are converging on using *([4]int)(x) for the checked conversion, with a panic if x is too short.**

Does anyone object to that? (Retitled as well.)

The one thing that I'm still not sure about is how often this comes up at all. I still need to check the corpus I have, but if anyone can chime in with anecdotal stories about when you would have needed this, that would be helpful too.

Thanks.

jimmyfrasche commented 4 years ago

@rsc I think this is a problem worth solving. I don't think it's a common enough problem to deserve weakening an invariant of existing syntax (which affects tools as well as mental checks). Overloading the type assertion syntax seems a better fit as it can already panic, but a builtin or even something in unsafe seems best. I ultimately wouldn't mind too much if conversion syntax is used but it certainly doesn't feel right to me.

@beoran this is for getting a pointer to the same underlying array as the slice uses: if you want a copy of the contents of that array you can use the copy builtin.

martisch commented 4 years ago

Anecdotal story for when I would like to have used this:

I was working on an optimization for a hot library code path. It was using a string as key to a map. That string got constructed by a strings.Join from a []string. For all practical purposes they were always 4 elements long. Due to historical reasons and generality the functions involved used a []string as input and changing this as it was a widely used API and config would have introduced alot of effort and churn in a large codebase.

Using [4]string as a map key (with a fallback for the case where it could have been used otherwise by joining the down to only using the first element of [4]string with a special terminal symbol) was a bit faster as it avoid an allocation to construct the string key. However there is no safe way to just convert []string to [4]string (or*[4]string) without copying [4]string (or allocating) which made the new code seem to be a bit slower then it needed to be.

Code that i liked to have ideally written for performance would have been (simplified):

if len(strs) == 4 {
   key := (*[4]string)(strs)
   ... 
   v := m[*key]
}

Googlers: Search for the phrase "In practice, entries always have four components" to find the unoptimized code location in the mono repo.

randall77 commented 4 years ago
key := (*[4]string)(strs)
   ... 
   v := m[*key]

That seems not a very compelling motivator for this issue, as you could do:

    var key [4]string
    copy(key[:], strs)
    v := m[key]

instead. One more line, but not really any more expensive (in an ideal compiler world).

martisch commented 4 years ago

Sure its already possible to use a copy. Not meaning to imply it was compelling. The compiler currently zeros key, then copies in strs with a typedslicecopy then moves the contents to a tmp and passes a pointer to the tmp copy to the runtime map routine. All in all 3 times storing to memory and 2 times loading from it that can be avoided. Ideally (which is easier to optimize by the compiler with the explicit conversion) the generated binary would just pass a pointer to the backing array of the strs slice.

https://godbolt.org/z/57ejs6

rsc commented 4 years ago

I still need to search for conversions in the corpus.

rsc commented 4 years ago

I started on this, and I commented on #19367 with some of what I found, but I don't have full results yet. Downloading and type-checking lots of code to look for specific use case possibilities.

rsc commented 3 years ago

I didn't find many such conversions, but it's likely that (1) people have worked around the lack and (2) there will never be too many of these, but when it's needed it will be important. The comments above have lots of potential uses.

The latest version of the proposal is https://github.com/golang/go/issues/395#issuecomment-669320638:

(*[4]int)(x) for the checked conversion, with a panic if x is too short.

And the panic will print both the target length and the too-short actual length.

I asked for objections on Aug 7 and no one had any (except me). Based on the discussion, then, this seems like a likely accept.

bcmills commented 3 years ago

Is it possible for [4]int to have a different alignment from int?

If so, would the conversion from []int to *[4]int also panic if the slice is not properly aligned for the array type?

randall77 commented 3 years ago

Is it possible for [4]int to have a different alignment from int?

No. The alignment of an array is the same as the alignment of its element.

SlyMarbo commented 3 years ago

Would it be possible for the syntax to require an explicit slice? For example, (*[4]int)(x[2:6]). Explicit slicing will be necessary anyway when you're not using the entire slice and this way any panic would happen during the slice (as it would today), meaning the conversion itself can't fail. The compiler can assert that the slice bounds are constant expressions and produce the same range as the array length. I think this would be less surprising than the conversion panicking or returning nil, plus any tooling looking for slices that could panic will still find this.

josharian commented 3 years ago

@SlyMarbo this was discussed at some length in earlier comments

rsc commented 3 years ago

No change in consensus, so accepted. It's a bit late to do this for Go 1.16, so will milestone to Go 1.17.

dolmen commented 3 years ago

If the slice to array pointer aliasing is made to work, I will also expect array to array pointer aliasing to work too.

Details below.

This is what this proposal is about:

var x []int
y := (*[4]int)(x[0:4])

I also expect this to work:

var w [8]int
y := (*[4]int)(w[0:4])
// because this is semantically the same as:
x := w[:]
z := (*[4]int)(x[0:4])

In fact, I think the debate should be first on aliasing array to a an array of a different size (through a pointer) before considering slices.

Note: I'm using alias here (instead of convert) as 2 variables point to the same memory, and this is how the Perl community describes it. In Go convert is ambiguous as some conversions are copies (string -> []byte, string -> []rune) while some aren't (just zero cost type cast).

randall77 commented 3 years ago

@dolmen, your code should work unmodified with this proposal. w[0:4] is a slice, to which this proposal applies.

dolmen commented 3 years ago

@randall77 Fair. Looks like I missed the obvious: I was still thinking to the syntax in the title of this issue which doesn't have the slice boundaries.

josharian commented 3 years ago

I've been slowly updating CL 216424 to implement this proposal as accepted (stay tuned). Two things I've noticed so far, neither dealbreakers, but both a bit unfortunate:

(1)

We have to massage the semantics of reflect.Value.Convert a bit. Current docs:

Convert returns the value v converted to type t. If the usual Go conversion rules do not allow conversion of the value v to type t, Convert panics.

Suggested new docs:

Convert returns the value v converted to type t. If the usual Go conversion rules do not allow conversion of the value v to type t, or if converting v to type t would panic, Convert panics.

Same for reflect.Type.ConvertibleTo:

Currrent docs:

ConvertibleTo reports whether a value of the type is convertible to type u.

Suggested docs:

ConvertibleTo reports whether a value of the type is convertible to type u. Even if ConvertibleTo returns true, the conversion may still panic.

This latter addition may invalidate a common assumption made in code. Because ConvertibleTo has only types to work with, not concrete values, I don't think w e can do any better. We might want to add another Value method like CanConvert that reports whether the conversion will succeed (not panic).

(2)

It is now possible to write a top level var decl that panics:

var (
  s = make([]byte,  5)
  p = (*[10]byte)(s) // panics
)

There's no place to put a recover that'll catch that panic. In normal life, that's not a big deal, but it's an annoyance for writing compiler tests, because there's no in-process way to test that a panic occurs without fundamentally changing the program.

mdempsky commented 3 years ago

The reflect.Value.Convert questions are interesting. Changing Convert seems fine. I am a little more nervous about changing ConvertibleTo, but changing it to report whether a conversion would be allowed by the type system seems sensible and consistent. Comparable returns true for interface types, even though comparing an interface value can panic if the interface contains a value whose dynamic type is not comparable.

It is now possible to write a top level var decl that panics:

I don't think this is a new possibility? There are plenty of ways to write var declarations at package scope that panic. E.g.,

var zero int
var _ = 1 / zero

var _ = *(*int)(nil)
gopherbot commented 3 years ago

Change https://golang.org/cl/301651 mentions this issue: go/types: allow conversion from slice to array ptr

gopherbot commented 3 years ago

Change https://golang.org/cl/301650 mentions this issue: cmd/compile: allow conversion from slice to array ptr

gopherbot commented 3 years ago

Change https://golang.org/cl/301652 mentions this issue: reflect: allow conversion from slice to array ptr

josharian commented 3 years ago

Weird. GopherBot missed a CL: https://golang.org/cl/216424, for the spec. (And that's all--a stack of four.)

yaxinlx commented 3 years ago

Does the conversion Ian mentioned in https://github.com/golang/go/issues/395#issuecomment-394862319 work according the final design?

type A [4]int
var s = (*A)[]int{1, 2, 3, 4}

If the answer is not, a.k.a. the conversion must be written as (*[4]int)[]int{1, 2, 3, 4}, then it would be some verbose to write the conversion, as the [4]int part in (*[4]int) is the only choice so it is not very essential. It would be even more verbose if the the array type is an type from another package and the package needn't to be imported if the conversion is not used.

josharian commented 3 years ago

@yaxinlx it does. Although you are missing some parens in your code:

type A [4]int
var s = (*A)([]int{1, 2, 3, 4})

Feel free to try it out using the CLs above.

josharian commented 3 years ago

In the spec CL @randall77 asks whether we should compare the array length to the slice capacity or the slice length. The discussion so far as been rather ambiguous on this question. We should resolve it here.

@randall77 says:

Using capacity is safe. But I often envision the space between length and cap to be "uninitialized" storage. This lets you access those elements without reslicing first.

ianlancetaylor commented 3 years ago

I have always assumed that the length of the slice must equal the length of the pointed-to array. I don't think capacity figures in here at all. After all, converting []byte to a string converts up to the length of the slice, not the capacity of the slice.