xiandaonancheng commented 2 months ago

Proposal Details

Background

As this link mentioned, the goroutines expose no unique identifier, name, or data structure to the programmer, because these maybe restrict to share more goroutines for processing. But based on my personal experience, I believe that many projects use gotoutine pool to handle several tasks. I think that we can reuse some objects in heap to reduce the number or time of GC. The goroutine local variable maybe a nice way to maintain these objects.

Design

Exposed to programmer

This proposal contains two ways to create go local variables that only have one instance for each goroutine:

1、Export a function "runtime.NewGoLocal" to create a go local variable holder by given key. Developers can create the same object holder in different scopes.

Example as below, the variables "a" and "b" are the same object holder in one goroutine:

func goLocalA(){
    a,_ := runtime.NewGoLocal[int]("go local", func() int {
        return 10
    })
    a.Val++
    fmt.Println(a.Val)
}

func goLocalB(){
    b,_ := runtime.NewGoLocal[int]("go local", func() int {
        return 10
    })
    b.Val++
    fmt.Println(b.Val)
}

2、Add a syntax token "go_local", just like "thread_local" in c++. Developers can define variables like "var", each variable is unique. These variables will be initialized only once in the same goroutine. Example:

func goLocalVar()  {
    go_local a int = 100
    a++
    fmt.Println(a)
}

Implement

To record goroutine local variables, add a map field to g struct, this map will be initialized when the call "go func" or create first go local variable.

type runtime.g struct {
    ...

    // localTable records GoLocal variables in this goroutine
    localTable map[any]unsafe.Pointer
}

To create goroutine local variables and alloc heap mem for them, these functions as below are written in package runtime. The function NewGoLocal and the type GoLocalHolder are exported to programmer for way 1.


type GoLocalHolder[T any] struct {
    Val T
}

type _InnerGoLocalKey[T any] struct {
    rawKey any
    v0     *T
}

// newGoLocalObject creates a go_local object and record it.
func newGoLocalObject(key any, typ *_type) (pObject unsafe.Pointer, alloc bool) {
    gp := getg()
    ptr, ok := gp.localTable[key]
    if ok {
        return ptr, false
    }
    if gp.localTable == nil {
        gp.localTable = map[any]unsafe.Pointer{}
    }
    ptr = mallocgc(typ.Size_, typ, true)
    gp.localTable[key] = ptr
    return ptr, true
}

// newGoLocalObjectForStringKey wraps newGoLocalObject for ssa calling.
func newGoLocalObjectForStringKey(key string, typ *_type) (pObject unsafe.Pointer, alloc bool) {
    return newGoLocalObject(key, typ)
}

// NewGoLocal creates a go local object for rawKey + type and returns its holder.
// This can use the same one object in multiple places by the same rawKey + type
func NewGoLocal[T any](rawKey any, initFunc func() T) (ptrHolder *GoLocalHolder[T], alloc bool) {
    key := _InnerGoLocalKey[T]{rawKey: rawKey, v0: nil}
    wrapper0 := (*GoLocalHolder[T])(nil)
    ptr, alloc := newGoLocalObject(key, abi.TypeOf(wrapper0).Elem())
    ptrHolder = (*GoLocalHolder[T])(ptr)
    if alloc && initFunc != nil {
        ptrHolder.Val = initFunc()
    }
    return ptrHolder, alloc
}

For way 2, we need to modify the compiler to support the syntax token "go_local".

We need to parse the go_local decl stmt like var decl stmt in the parsing phase, and rewrite the go_local stmt to its init stmts in ir construction phase. And finally implement these rewritten stmts in the follow phases (middle end phase and ssa phase).

Before rewrite:

go_local a int = initA()

After rewrite:

&a, _compile_only_a_need_init := newGoLocalObjectForStringKey(a_name@a_pos, int_type)
if _compile_only_a_need_init {
    a = initA()
}

As this rewrite example as above, we call function newGoLocalObjectForStringKey with the unique key ("a_name@a_pos") of "a" and the type of "a". Set the results to the address of "a" and a virtual bool variable indicating if "a" need to initialize.

The go_local variable "a" must be force setted to escape variable in middle end phase to ensure it can accept heap address.

The function newGoLocalObjectForStringKey will be called in ssa phase, and set the result ssa valus to the target ir name nodes (variables "a" and "_compile_only_a_need_init").

Unspport

However, the way 2, "go_local" token only can define variables in function inner scopes, can't define variables in package global scopes. Can use way 1 with same key and type instead of "go_local" in package global scopes in this proposal.

I think that package global scopes "go_local" defination maybe takes a lot work, and i have no idea to implement it. If anyone can help implement it, i will appreciate a lot.

More Details

More details can refer this PR (will update a new PR if this proposal can be approved)

gabyhelp commented 2 months ago

Related Issues and Documentation

cmd/compile: support go local variables #69422 (closed)

_{(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)}

adonovan commented 2 months ago

The FAQ gives a pretty solid rationale for why TLS was intentionally omitted from the design of Go: without it, the behavior of a function call is independent of the goroutine in which it executes. Giving up that invariant for a mere performance gain seems like a very high price to pay. Could you explain in more detail how goroutine-local storage would allow you to reduce the number of objects allocated?

xiandaonancheng commented 2 months ago

The FAQ gives a pretty solid rationale for why TLS was intentionally omitted from the design of Go: without it, the behavior of a function call is independent of the goroutine in which it executes. Giving up that invariant for a mere performance gain seems like a very high price to pay. Could you explain in more detail how goroutine-local storage would allow you to reduce the number of objects allocated?

I think that the goroutine pool just works like this small example below (more detail can refer the ants repo which i often use), the variable "a"/"b" is cached in the worker goroutine, so it can only alloc mem once in each worker goroutine.

func main() {
    goPoolSize := 100
    reqCh := make(chan func(), goPoolSize)
    for i := 0; i < goPoolSize; i++ {
        go func() {
            for f := range reqCh {
                f()
            }
        }()
    }
    server := NewServer()
    for {
        req := server.AcceptReq()
        switch req := req.(type) {
        case Req1:
            reqCh <- func() {
                doReq1(req)
            }
        case Req2:
            reqCh <- func() {
                doReq2(req)
            }
        }
    }
}

func doReq1(req any) {
    go_local a SomeObject
    ... // do something with req and a
}

func doReq2(req any) {
    go_local b SomeObject
    ... // do something with req and b
}

Another way to reuse objects is the objects pool (goroutine-local just like objects pool binding goroutine, I think). I did some benchmark tests for the go_local (code and result are below), but unfortunately, the result says that go_local is not better than objects pool. Maybe get object from map is slower than from pool. What a terrible attempt, LOL 😓. The go_local maybe just only can reuse cached objects simply, do not consider manually type conversion and where to call pool.Put.

func BenchmarkVal(b *testing.B) {
    ch, wg := goPool(goPoolSize)
    for i := 0; i < b.N; i++ {
        ch <- func() {
            var a TVal
            a.i++
            ptr = &a.i // to make "a" escape
        }
    }
    close(ch)
    wg.Wait()
}

func BenchmarkGoLocal(b *testing.B) {
    ch, wg := goPool(goPoolSize)
    for i := 0; i < b.N; i++ {
        ch <- func() {
            go_local a TVal
            a.i++
            ptr = &a.i
        }
    }
    close(ch)
    wg.Wait()
}

func BenchmarkGoLocal2(b *testing.B) {
    ch, wg := goPool(goPoolSize)
    for i := 0; i < b.N; i++ {
        ch <- func() {
            a, _ := runtime.NewGoLocal[TVal](1, func() TVal {
                return TVal{}
            })
            a.Val.i++
            ptr = &a.Val.i
        }
    }
    close(ch)
    wg.Wait()
}

func BenchmarkPoolVal(b *testing.B) {
    ch, wg := goPool(goPoolSize)
    for i := 0; i < b.N; i++ {
        ch <- func() {
            a := pool.Get().(*TVal)
            a.i++
            ptr = &a.i
            pool.Put(a)
        }
    }
    close(ch)
    wg.Wait()
}

func goPool(workers int) (chan func(), *sync.WaitGroup) {
    wg := &sync.WaitGroup{}
    ch := make(chan func(), workers)
    for j := 0; j < workers; j++ {
        wg.Add(1)
        go func() {
            defer wg.Done()
            for f := range ch {
                f()
            }
        }()
    }
    return ch, wg
}

var ptr *int

type TVal struct {
    bs [1024]byte
    i  int
}

var pool = sync.Pool{New: func() interface{} {
    return &TVal{}
},
}

var goPoolSize = 100

goos: darwin
goarch: amd64
pkg: github.com/golang/go/test/golocal
cpu: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
BenchmarkVal
BenchmarkVal-8           1983124           586.0 ns/op
BenchmarkGoLocal
BenchmarkGoLocal-8       2744718           429.9 ns/op
BenchmarkGoLocal2
BenchmarkGoLocal2-8      2720476           439.8 ns/op
BenchmarkPoolVal
BenchmarkPoolVal-8       2967628           395.1 ns/op

adonovan commented 2 months ago

Why can't you create the worker state (SomeObject) explicitly in each worker goroutine and pass it to each function f?

    goPoolSize := 100
    reqCh := make(chan func(), goPoolSize)
    for i := 0; i < goPoolSize; i++ {
        go func() {
            var workerState SomeObject
            for f := range reqCh {
                f(&workerState)
            }
        }()
    }

ianlancetaylor commented 2 months ago

I want to reiterate what @adonovan said above: the FAQ explains why we don't have goroutine-local variable. This proposal describes a way to implement goroutine-local variables. The proposal does not explain why we should do that. We made a decision long ago to not permit goroutine-local variables. We are not going to change that decision unless we have new information that suggests that we should. Thanks.

xiandaonancheng commented 2 months ago

Ok, I see. Thank you for reminding me.

The goroutine local idea is from a small talk with my firend (Joey). He is a go and c++ programmer. He introduced the thread local in c++ to me. Maybe the thread local variables is mainly to keep the context or status of thread in other languages, but we are more interested in reusing the objects defined in goroutine pool. I mainly consider goroutine local objects reusing for this proposal. I will give the reasons of this proposal based on this reusing example as below.

Why can't you create the worker state (SomeObject) explicitly in each worker goroutine and pass it to each function f?

The way @adonovan mentioned above is a feasible plan to reuse objects. But i think there are some points that need attention:

If the worker goroutine serves two or more functions, we need write corresponding branchs logic. eg. There are two functions doReq1(need ObjectA) and doReq2(need ObjectB). We need to create ObjectA and ObjectB explicitly, and pass them to corresponding function. Of course, we can warp ObjectA and ObjectB into a big state variable and put it to all functions, but i think put a unused variable to function is not very nice.
We maybe reuse the object in a deep function, so we need pass it via a long calling chain.
From the perspective of business system design, the goroutine pool is in common framework layer, the functions doReq1 and doReq2 are in business logic layer. I think they should decouple, so there should not be many business logics in goroutine pool part.
Based on my experience, business programmers are more inclined to use existing open source projects and go features. So i maybe use the ants repo to manage goroutines pool and use sync.Pool to manage big objects.

I think goroutine local variables have good performance in these areas.

Each business function can only difine its own objects, the function can't access the objects of other functions. And don't need to dispatch objects for functions in worker goroutine.
We can define the go local variables at the position where to use, they will not appear out of their scopes.
Because we don't need to dispatch objects for functions in worker goroutine, the goroutine pool part only has simple serving logic.
The sync.Pool need to consider manually type conversion and where to call pool.Put. But the go local can directly define variables with their type (just like var keyword), and only can access the variables in their scopes.

Based on these reasons, I think goroutine local variables can firendly help programmer easy to reuse big object resources when they use goroutine pool.

If you find any disadvantages or errors, please feel free to point them out.

adonovan commented 2 months ago

The solution to your problem in Go is to use contexts, which are an explicit mechanism by which values may be passed down many levels of the call tree. The most important such variable is an event representing "has this task been cancelled?", but you can put additional arbitrary data in a context too; see context.WithValue.

As a rule, a function that accepts a Context should document which values it require to be present in the Context. Nothing will stop you from (ab)using contexts as a dumping ground of hidden state, just as often happens with thread-local store.

xiandaonancheng commented 2 months ago

Yeah, You are right. I know the context.Value doesn't have any limit, but we usually only store some meta datas (trace id, session id, etc.) and settings (timeout, etc.) in context. Emmm, maybe we limit ourselves...

ianlancetaylor commented 1 month ago

Based on the discussion above, and the emoji voting, this is a likely decline. Leaving open for four weeks for final comments.

ianlancetaylor commented 1 week ago

No further comments.

golang / go

proposal: spec: support goroutine local variables #69478

Proposal Details

Background

Design

Exposed to programmer

Implement

Unspport

More Details