golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.3k stars 17.38k forks source link

proposal: new annotation to convert array of structs (AoS) to struct of arrays (SoA) #64926

Closed mappu closed 5 months ago

mappu commented 6 months ago

Proposal Details

Hi,

Here's a really common code example, of an array of structs (AoS):

type User struct {
    Name string
    XPos int64
    YPos int64
}

var World []User

However if the access pattern is to iterate over the same property, for each user, then for modern CPU caches, sometimes it can be more efficient to write code as a struct of arrays (SoA):

var World struct {
    UserName []string
    UserXPos []int64
    UserYPos []int64
}

This can sometimes significantly improve performance. However, it's more cumbersome to write this way, and all call sites need to change. In a large codebase, it's hard to experiment with this idea.

I propose a new annotation that can automatically convert AoS declarations to SoA:

var World []Users //go: soa

This would perform a compiler-internal transformation to use an SoA representation in memory.

Callers would still access fields as if AoS was used and the annotation were not present.

Any Go implementations not understanding the annotation would continue to treat the fields as AoS. This would affect older versions of the Go toolchain as well as TinyGo / gccgo.

One effect is each individual array element (e.g. World[3]) can no longer have its address taken to get a *User.

The benefit of this proposal is to easily support transitioning to AoS in a large existing codebase.

Future work, the compiler could recognize some patterns (e.g. short accumulation loop) and use SIMD automatically.


This is based on previous work in other programming languages:

Supporting discussion:

apparentlymart commented 6 months ago

I think this is an interesting idea, but I don't think there's any significant precedent so far for a comment to materially influence the layout of a type in memory. I do see the benefit that it could therefore be backward compatible with older Go toolchain versions and other implementations, but historically that hasn't been a significant design constraint, and recent changes to the official toolchain have moved in the direction of making it easier to upgrade rather than easier to keep running older Go versions.

That aside, this feels to me as one example of a more general idea of choosing between different machine representations of a single type at the Go level. I expect that a first decision to be made here is whether Go is the kind of language that aims to offer that sort of control over implementation details that aren't visible at the Go level of abstraction; if the answer to that is yes then I would prefer to see it specified in a way that suggests how other similar concerns could be met in the future.

For example, Rust has the idea of a repr attribute attached to a type declaration, which has evolved to meet a number of different use-cases where the Rust-level type could be usefully lowered to more than one machine-level representation. Go doesn't have something analogous to Rust's attribute syntax though, so that analogy isn't directly actionable. 🤔

egtann commented 6 months ago

As another programming language example/prior art, Zig uses this layout extensively in its compiler to great effect. Zig has a syntax that makes looping through them very nice: https://zig.news/andrewrk/multi-object-for-loops-data-oriented-design-41ob.

zephyrtronium commented 6 months ago

A //go:soa comment is not sufficient. SoA arrays are structured in a fundamentally different manner from AoS ones. If you were to declare var World []User //go:soa and then World to a function taking a []Users, that function would do the wrong thing. If you try to declare var nonsense []string //go:soa or var nonsense int //go:soa, that needs to fail. Addressability is a property of types, not values, so we can't just say, "Elements of slices and arrays are addressable, unless the variable declaration has a particular magic comment next to it."

SoA needs to be formally marked as a different kind of type. That means (or, arguably, regardless) this needs to be a language change proposal (with corresponding template). Ideally you could also precisely describe how this interacts with reflection – do we need a new reflect.Kind (probably yes)? do we need to add any other new reflection APIs (probably no)?

ajwerner commented 6 months ago

Can you help justify this as a language change? I feel like this is the sort of thing which needs some experience. go generate is a good starting point for something like this. What I imagine would be useful is a tool which can be used to generate a go file with a struct corresponding to an SoA for a structure type. That wouldn't require any sort of language change and it would be immediately useful.

Like imagine:

//go:generate go run github.com/hypothetical/soatool --type User

type User struct {
    Name string
    XPos int64
    YPos int64
}

This could generate some file user_soa.go:

type Users struct {
   Name []string
   XPos []int64
   YPos []int64 
}

func (u *Users) Append(u User) { /* ... */ }
func (u *Users) Get(i int) User { /* ... */
func (u *Users) Len() int { /* ... */ }

Then with the new ranging over functions stuff, you can imagine a method on Users which reconstructs individual user structs on the fly for iteration if you wanted them in the composed form.

ianlancetaylor commented 5 months ago

Thanks for the proposal, but as @zephyrtronium points out this is simply impossible as written. I'm going to close this proposal. Please comment if you disagree.