golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.03k stars 17.67k forks source link

proposal: slices, maps: add Pointer iteration functions #69916

Closed destel closed 3 weeks ago

destel commented 3 weeks ago

Proposal Details

Problem Statement

When using a for-range loop to iterate over a slice of structs, the loop variable holds a copy of each struct. This can lead to unexpected behavior and hard-to-catch bugs when attempting to modify slice elements within the loop.

The problematic code looks like this:

users := []User{{Name: "Alice", Age: 30}, {Name: "Bob", Age: 25}}

for _, user := range users {
    user.Age++ // slice remains unchanged
}

The typical solution for this is explicit indexing. This is fine, but it can become very verbose, especially when multiple fields need modification:

for i := range users {
    users[i].Age++
}

Another solution is to take the address of a slice item at the beginning of the loop body:

for i := range users {
    user := &users[i]
    user.Age++
}

Proposal

Introduce a new function slices.Pointers that returns an iterator that yields pointers to the slice elements. This solution provides a clear and concise way to iterate over and modify slice elements in-place, reducing the likelihood of errors related to value semantics in for-range loops.

The code would look like this:

for _, user := range slices.Pointers(users) {
   user.Age++ // this now modifies the actual slice element
}

And the implementation:

func Pointers[Slice ~[]E, E any](s Slice) iter.Seq2[int, *E] {
    return func(yield func(int, *E) bool) {
        for i := range s {
            if !yield(i, &s[i]) {
                return
            }
        }
    }
}

Maps

A similar problem exists for maps. However, this part of the proposal is more controversial for the following reasons:

From a developer's perspective, the problem is the same as for the slices:

users := map[string]User{
  "userA": {Name: "Alice", Age: 30}, 
  "userB": {Name: "Bob", Age: 25},
}

for _, user := range users {
    user.Age++ // map remains unchanged
}

The typical solution here is to do a read-modify-write operation:

for k, user := range users {
    user.Age++ // map still remains unchanged
    users[k] = user // now the change is written
}

The pattern above can be encapsulated in a function maps.Pointers:

for _, user := range maps.Pointers(users) {
    user.Age += 1
}

Such function name is not 100% fair since it yields pointers to temporary variables, rather than map entries, as seen in the implementation below:

func Pointers[Map ~map[K]V, K comparable, V any](m Map) iter.Seq2[K, *V] {
    return func(yield func(K, *V) bool) {
        for k, v := range m {
            done := !yield(k, &v)
            m[k] = v
            if done {
                return
            }
        }
    }
}

Single-key map modifications

This is the most controversial part, since it abuses the iterators feature. For maps like above even single-key modifications has to be made using a read-modify-write operation:

// this does not compile
users["userA"].Age++

// this works, but looks too verbose
u := users["userA"]
u.Age++
users["userA"] = u

The idea is to introduce another function that "iterates" over a single key. Then the code above would look like

for _, user := range maps.Pointer(users, "userA") {
    user.Age += 1 // changes age for the key userA
}

Conclusion

This proposal aims to address common pain points in Go programming related to modifying elements in slices. It's aimed at improving readability and reducing errors. The proposed solutions for maps, while addressing real issues, present more complex trade-offs that, I believe, worth discussing.

gabyhelp commented 3 weeks ago

Related Issues and Documentation

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

seankhliao commented 3 weeks ago

I don't think we should add this: this is easy to misuse if you store the pointer, but the slice is later extended, you now have a pointer to the original backing array, but the slice now uses a new backing array. This goes against the loopvar changes where we made storing the iteration variable safer.

package main

import "fmt"

func main() {
    // some slice
    slice := make([]string, 2, 2)

    // ref to store the pointer
    var ref *string

    // proposed Pointers function
    for i := range slice {
        ref = &slice[i]

        *ref = "modify 1"
    }

    // show current slice
    fmt.Println(slice) // [modify 1 modify 1 c d]

    // slice is extened, contents copied to
    slice = append(slice, "c", "d")

    // another modification
    *ref = "modify 2"

    // modification doesn't show up in slice
    fmt.Println(slice) // [modify 1 modify 1 c d]
}
atdiar commented 3 weeks ago

Why wasn't a slice of pointers used? If users are supposed to be mutable values, one could expect to see []*user?

destel commented 3 weeks ago

@atdiar I agree that I picked a bad example to demonstrate the issue. It's much more logical to have a pointer to the user object. Still, there are cases where using value objects are more appropriate, while they still need to be mutated.

One example that comes to mind is array an of points

for _, p := range points {
    // this won't work
    p.X *= 2
    p.Y *= 2
    p.Z *= 2
}

Another example is an array of time ranges, where everything should be converted to another time zone

type TimeRange struct {
    Start, End time.Time
}

for _, tr := range timeRages {
    // this won't work
    tr.Start = tr.Start.UTC()
    tr.End = tr.End.UTC()
}

In the time range example, I once encountered a bug because I didn't notice it was a slice of values. Of course slices.Pointers wouldn't magically fix or prevent such bug, the root cause here was a lack of proper testing and attention to details. The intent behind this proposal is to provide a more concise and standardized syntax for a common task, potentially improving code readability in certain scenarios.

atdiar commented 3 weeks ago

I see.

For the array of points, the points are not really mutable. It's the list itself that changes. As @seankhliao mentioned, if we go for shortcuts instead of semantics, we risk getting into aliasing bugs a bit too easily.

What I would do is simply mutate it in place.

for i, p := range points { 
        p.X *= 2 
        p.Y *= 2    
    p.Z *= 2 
    points[i] = p
}

Same with the time range, the time can't change but the array/slice of i.e. the actual ranges can. I would mutate reassign as I iterate.

Seems semantically sounder.

destel commented 3 weeks ago

@seankhliao I understand your example, and you're right. It's potentially dangerous and error-prone to store pointers to slice elements. This is true regardless of method used to iterate the slice, and even true for linear code that doesn't have loops.

The only way to prevent such errors, that I know, is to avoid pointers to slice elements completely. In the scenarios I described earlier, that would mean falling back to explicit indexing. This is fine, but can become too verbose at times. I think it's all about a balance between safety and practicality.

earthboundkid commented 3 weeks ago

This is the same as https://github.com/golang/go/issues/69345 from a month ago. The same objections apply. Anyone who fails to use user := &users[i] will also fail to use slices.Pointers, and the convenience added by the function doesn't meet the bar for the standard library.

Also in general, for Go you need to decide whether you plan to take pointers to a slice's elements or not, and if you do take pointers, you must never append to the slice again or ever swap elements (e.g. to sort it). It's a lesson I've learned the hard way more than once. Having this will just encourage more people to make the same mistake instead of using []*T which is sometimes less efficient but always less subject to subtle bugs.

zigo101 commented 3 weeks ago

The slice part of the proposal is good and straightforward. The map part will cause many confusions. I don't understand why the OP of https://github.com/golang/go/issues/69345 closed that propsoal. Maybe it should be re-opened.

apparentlymart commented 3 weeks ago

I broadly agree with the objections above: this seems easy to misuse in ways that are far more subtle than the mistake of modifying a temporary that's then immediately discarded.

However, the problem of accidentally modifying a loop-local temporary in this way has tripped me up before. Perhaps the more appropriate resolution for that would be to add a go vet rule for any case where some part of a loop-local variable is written to and then not subsequently read before the end of the block?

Perhaps the message could directly suggest using a slice of pointers instead, if the for is over elements of a slice, though the reader could also choose to take a pointer to the current element despite the risk of the hazards described in earlier comments.

destel commented 3 weeks ago

@apparentlymart

add a go vet rule for any case where some part of a loop-local variable is written to and then not subsequently read

This sounds very good to me

destel commented 3 weeks ago

I am closing this in favor of #69345 and @apparentlymart suggestion