sync: add OnceFunc, OnceValue, OnceValues

adg commented 2 years ago

This is a proposal for adding a generic OnceFunc function to the sync package in the standard library.

In my team's codebase we recently added this function to our private syncutil package:

// OnceFunc returns a function that invokes fn only once and returns the values
// returned by fn. The returned function may be called concurrently.
func OnceFunc[T any](fn func() (T, error)) func() (T, error)

(I put this in the (temporary) module github.com/adg/sync, if you want to try it out.)

This makes a common use of sync.Once, lazy initialization with error handling, more ergonomic.

For example, this Server struct that wants to lazily initialize its database connection may use sync.Once:

type Server struct {
    dbPath string

    dbInit sync.Once
    dbVal  *sql.DB
    dbErr  error
}

func NewServer(dbPath string) *Server {
    return &Server{
        dbPath: dbPath,
    }
}

func (s *Server) db() (*sql.DB, error) {
    s.dbInit.Do(func() {
        s.dbVal, s.dbErr = sql.Open("sqlite", s.dbPath)
    })
    return s.dbVal, s.dbErr
}

func (s *Server) DoSomething() error {
    db, err := s.db()
    if err != nil {
        return err
    }
    _ = db // do something with db
    return nil
}

While with OnceFunc a lot of the fuss goes away:

type Server struct {
    db func() (*sql.DB, error)
}

func NewServer(dbPath string) *Server {
    return &Server{
        db: sync.OnceFunc(func() (*sql.DB, error) {
            return sql.Open("sqlite", dbPath)
        }),
    }
}

func (s *Server) DoSomething() error {
    db, err := s.db()
    if err != nil {
        return err
    }
    _ = db // do something with db
    return nil
}

Playground links: before and after.

If there is interest in this, then I suppose it should first live in x/exp (as with the slices and maps packages) so that we can play with it.

This seems to me like a great example of how generics can be used in the standard library. I wasn't able to find an overall tracking bug for putting generics in the standard library, otherwise I'd have referenced it here.

icholy commented 2 years ago

Alternative API:

type Server struct {
    once sync.TryOnce[*sql.DB]
}

func (s *Server) db() (*sql.DB, error) {
    return s.once.Do(func() (*sql.DB, error) {
        return sql.Open("sqlite", dbPath)
    })
}

earthboundkid commented 2 years ago

I have one of these in https://github.com/carlmjohnson/syncx. I think it’s suitable for the standard library, but it should probably come as part of a std wide addition of generics, not a one off.

cespare commented 2 years ago

I played around with this in a large codebase. It's a nice idea.

I do prefer @icholy's API suggestion. It seems that the ergonomics are about the same and also that it would be a little harder to accidentally misuse. I could see people writing

func GetFoo() (*Foo, error) {
        return OnceFunc(getFoo)
}

whereas the equivalent mistake with TryOnce seems less likely (especially if people are used to sync.Once already). And in general, it just seems little nicer/more typical to have the shared state be a struct value rather than a function; for example, consider

var fooOnce TryOnce[*Foo]

func GetFoo() (*Foo, error) { return fooOnce.Do(getFoo) }

vs.

var getFooOnce = OnceFunc(getFoo)

func GetFoo() (*Foo, error) { return getFooOnce() }

When I was looking at how sync.Once gets used in my codebase, I found that I could categorize them roughly four ways:

A single return value (either no possibility of an error or else the error leads to os.Exit, log.Fatal, etc).
Two return values: (T, error).
No initialization values created (just a one-time check or something)
Multiple values initialized / a bunch of state configured

For (3) and (4), sync.Once seems about optimal right now. This proposal helps with (2). But I noticed that (1) is even more common than (2). So maybe having both would be best:

type OnceVal[T any] struct { /* ... */ }

func (o *OnceVal[T]) Do(f func() T) T

type OnceError[T any] struct { /* ... */ }

func (o *OnceError[T]) Do(f func() (T, error)) (T, error)

or even

type Once1[T any] struct { /* ... */ }

func (o *Once1[T]) Do(f func() T) T

type Once2[T1, T2 any] struct { /* ... */ }

func (o *Once2[T1, T2]) Do(f func() (T1, T2)) (T1, T2)

earthboundkid commented 2 years ago

FWIW, I wrote the closure version and find it much more ergonomic. It wouldn't occur to me to write

func GetFoo() (*Foo, error) {
        return OnceFunc(getFoo)
}

since it obviously should be var GetFoo = sync.OnceFunc(getFoo), but it's hard to predict what kind of error people will make en mass until it's in the wild.

earthboundkid commented 2 years ago

Also I don't think it makes sense for this to return an error for the reasons given here: https://github.com/golang/go/issues/53696#issuecomment-1176238913

I also had a situation when I wanted to make some initialization in my app lazy (because it runs on Lambda and cold start is a pain) but the initialization could potentially fail, but once that's the situation, there's no good API (at least that I've seen). The errors have to be dealt with somewhere. (If they could be ignored, regular sync.Once would work.) If the system assumes initialization has already happened, the path to deal with the error isn't there and all you can do is crash. If it doesn't make that assumption, you need to handle the error every time you interact with the object, so it's not really "initialized" just "gettable".

adg commented 2 years ago

@carlmjohnson wrote:

The errors have to be dealt with somewhere. (If they could be ignored, regular sync.Once would work.) If the system assumes initialization has already happened, the path to deal with the error isn't there and all you can do is crash. If it doesn't make that assumption, you need to handle the error every time you interact with the object, so it's not really "initialized" just "gettable".

I proposed a OnceFunc that returns (T, error) because my code, and other code I have observed in the wild, often stores an initialization error alongside the initialized value.

A few examples I quickly pulled from the Go core (there are many more, I didn't want to look exhaustively):

Your circumstances may call for a different error handling mechanism (crashing), but others prefer to handle the error every time the resource is requested. Then upstream callers can decide whether it's crash-worthy or not.

I think that without the error return value the proposed OnceFunc is not very useful. Otherwise you should just do the initialization earlier, since the program should crash if the resource isn't available anyway, or you don't care about handling the error in which case (as you say) the existing sync.Once gives you almost everything you need already.

adg commented 2 years ago

@icholy suggested:

Alternative API:

Let me just expand your suggested API to do exactly what the other examples are doing, so that it's a fair comparison:

type Server struct {
        dbPath string
        dbOnce sync.TryOnce[*sql.DB]
}

func NewServer(dbPath string) *Server {
        return &Server{dbPath: dbPath}
}

func (s *Server) db() (*sql.DB, error) {
        return s.dbOnce.Do(func() (*sql.DB, error) {
                return sql.Open("sqlite", s.dbPath)
        })
}

func (s *Server) DoSomething() error {
        db, err := s.db()
        ...
}

I like that the type has a usable zero value, which means you don't need a constructor for Server just to set up this value (but we do need the something to set the dbPath field, or whatever other state goes into the closure). However in exchange for that we need a wrapper function (the db method here), so we immediately return to equal in terms of boilerplate.

I like that baking the once-ness into the type declaration, instead of just using a plain closure, gives some indication on sight that it's a once-initialized value.

Here's a variation on what you suggest, which is arguably less boilerplatey, as we don't need to store the closure state anywhere. TheNewOnceFunc function returns a *OnceFunc[T] with a Do() (T, error) method:

type Server struct {
    db *sync.OnceFunc[*sql.DB]
}

func NewServer(dbPath string) *Server {
    return &Server{
        db: sync.NewOnceFunc(func() (*sql.DB, error) {
            return sql.Open("sqlite", dbPath)
        }),
    }
}

func (s *Server) DoSomething() error {
    db, err := s.db.Do()
    ...
}

But to immediately argue against this: an advantage of baking the state (dbPath, in in this example) into the once function itself is that we don't expect that changing it later will have any effect. For instance, if we changed dbPath after the first call to the db function we might expect to access a different database. Putting that state in the closure makes it harder to make that mistake.

With all that said, my main objection to these proposals (compared to my original proposal) is that they make it harder to substitute a different initialization function that doesn't use OnceFunc. A central advantage of my original proposed API is that a sync.OnceFunc can wrap any func() (T, error) transparently, so that downstream callers don't know (and shouldn't care) that they're invoking it only once. In my experience this is a valuable property.

adg commented 2 years ago

@cespare my instinct is that having fewer things is better than more things. If someone wants to call a OnceFunc and ignore the error, they could just ignore the error.

cespare commented 2 years ago

However in exchange for that we need a wrapper function (the db method here)

I do think that a db method is nicer than a function as a struct field. That strikes me as unusual-looking about your original example.

Also, the original example doesn't look like the common use cases I see for sync.Once. I most often see sync.Once used for package-level initialization. That's what most of the links you located in https://github.com/golang/go/issues/56102#issuecomment-1272643083 are as well.

So we have something like this today:

var state struct {
    once sync.Once
    val *Thing
    err error
}

func State() (*Thing, error) {
    state.once.Do(func() {
        state.val, state.err = loadState()
    })
    return state.val, state.err
}

func loadState() (*Thing, error) { /* ... */ }

With OnceFunc, it could be

var loadStateOnce = sync.OnceFunc(loadState)

func State() (*Thing, error) {
    return loadStateOnce()
}

func loadState() (*Thing, error) { /* ... */ }

and with TryOnce, it would be

var loadStateOnce sync.TryOnce[*Thing]

func State() (*Thing, error) {
    return loadStateOnce.Do(loadState)
}

func loadState() (*Thing, error) { /* ... */ }

Neither has a real advantage in terms of conciseness. Adjusting either of these to get rid of the once-ness would be trivial.

One thing I notice about OnceFunc is that it might be tempting to save a few lines and write:

var State = sync.OnceFunc(loadState)

func loadState() (*Thing, error) { /* ... */ }

I think this would be a mistake, though. So both in your original example and here, OnceFunc seems to promote the use of function values in places where we would idiomatically use methods or normal functions in Go. OnceFunc seems like a function that would be very much at home in the functional languages I use, but not Go. (For instance it is available as memoize on a 0-ary function in Clojure.)

adg commented 2 years ago

Actually the case in which I use it is globals; I chose the struct form as it's as slightly more complex context.

This is how you would use OnceFunc for your state example:

var state = sync.OnceFunc(func() (*Thing, error) {
        /* ... */
}

This IMO is way more concise than any of the alternatives.

(Sorry sent this before I had finished writing it.)

I think that function variables are fine in the context of private globals. If you wanted to export it and protect it from mutation outside the package, you could write:

func State() (*Thing, error) { return loadState() }

var loadState = sync.OnceFunc(func() (*Thing, error) {
        /* ... */
})

OnceFunc seems to promote the use of function values in places where we would idiomatically use methods or normal functions in Go.

I think this kind of argument is not vey helpful in this context. We have new possibilities with generics; we should argue on the pros/cons, not by the established practices that pre-date the tools we have available today.

cespare commented 2 years ago

I think this kind of argument is not vey helpful in this context. We have new possibilities with generics; we should argue on the pros/cons, not by the established practices that pre-date the tools we have available today.

Ah -- I'd been taking it as self-evident that promoting function variables over functions should be avoided. My mistake.

I think that replacing functions with variables is a poor practice on the merits:

Having multiple ways to do things is a burden on code writers and code readers. (As an occasional JavaScript writer I never know whether I should be declaring my functions using function or var.)
The fact that a var is mutable means the reader of the code doesn't know whether the value has changed without locating all references to the var.
The fact that a var is mutable opens the possibility of a data race if it is modified elsewhere.
Using vars instead of functions further encourages mutating the var for testing purposes, an ill-advised practice that interferes with test parallelization and makes debugging harder.
Invoking a function value through a var is slower than calling a function.
The stack trace you get when calling a function through a var is less helpful (it doesn't include the var location).

Therefore, I think that APIs should not encourage using vars where functions or methods would have traditionally been used, and thus I think that a struct type is a better API here than a higher-order function.

adg commented 2 years ago

I'd been taking it as self-evident that promoting function variables over functions should be avoided.

I think this is true, for most uses of global function variables, and most certainly exported ones. I'd go further to suggest that global variables as a whole are best avoided where possible.

But I don't think function variables should be avoided as a whole. For instance, look at this pitfall you describe:

Using vars instead of functions further encourages mutating the var for testing purposes, an ill-advised practice that interferes with test parallelization and makes debugging harder.

One good solution to this is to actually use a function variable, rather than mutating global state. For example, if you want to test some code that works with time, passing it a mock now func() time.Time (either as a function argument or by setting a struct field) allows you to control precisely what that function does in your tests.

So from here I'm assuming that function variables do have some value, and should be used where appropriate.

In the context of this proposal, where OnceFunc might be used to set a global function variable it would be replacing the use of three global variables. From your example:

var state struct {
    once sync.Once
    val *Thing
    err error
}

I think, on the whole, if you're writing programs with globals like this then you're already vulnerable to most of the pitfalls you describe. If anything, OnceFunc makes it harder to make a mess of it. (This is why I chose to use the Server struct example, btw.)

Other concerns you raised:

Invoking a function value through a var is slower than calling a function.

That may be true in isolation but we'd need to benchmark the different approaches described here to make any efficiency arguments.

The stack trace you get when calling a function through a var is less helpful (it doesn't include the var location).

These stack traces seem equally helpful to me: before and after

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 1 year ago

This is clearly a very common operation that we should make easier. The only discussion seems to be whether to include the error in the function signature. So maybe there should be two forms: one with the error and one without. Generalizing "with error" to two values may make sense. But if so, what are the names? OnceVal / OnceError and Once1 / Once2 both seem a bit strange. Maybe we should find a name that's not "Once"?

sync.Lazy[T], sync.Lazy2[T] - lazy is maybe overused, or maybe it should have the function ahead of time sync.Memo[T], sync.Memo2[T] - memoizing is usually parameterized, and this isn't sync.Cache[T], sync.Cache2[T] - but caches can be cleared, and this can't

So lots of ideas, none of them great.

icholy commented 1 year ago

What about a single type with 2 methods:

type Memo[T any] struct {
    val  T
    err  error
    once sync.Once
}

func (m *Memo[T]) Do(f func() T) T {
    m.once.Do(func() {
        m.val = f()
    })
    return m.val
}

func (m *Memo[T]) DoErr(f func() (T, error)) (T, error) {
    m.once.Do(func() {
        m.val, m.err = f()
    })
    return m.val, m.err
}

DeedleFake commented 1 year ago

This reminds me of #37739, but implemented with generics instead of as a language change and tailored specifically for use with concurrency. Maybe it makes sense to split the concurrency support out? In other words, have a lazy value interface somewhere non-concurrency specific and a sync.Lazy struct that wraps it to make it thread-safe?

package something

type Lazy[T any] interface {
  Eval() T
}

package sync

type Lazy[T any] struct {
  lazy something.Lazy[T]
}

rsc commented 1 year ago

To try to move things forward, what do people think of

type OnceValue[T any] struct { ... }
func (*OnceValue[T]) Do(func() T) T 

type OnceValueErr[T any] struct { ... }
func (*OnceValueErr[T]) Do(func() (T, error)) (T, error)

?

It seems like the name should begin with Once so that people find it when they are looking for sync.Once.

icholy commented 1 year ago

~~I think that two types is overkill. People can easily ignore the error return if they don't need it.~~

edit: I like the two type approach the best.

earthboundkid commented 1 year ago

I prefer icholy's suggestion of one type with two methods to two types.

zephyrtronium commented 1 year ago

Having one type with two methods leaves open the possibility to use both with one OnceValue. That seems like a source of bugs. I can imagine legitimate ways to use both methods together, but the (T, error) form subsumes them.

icholy commented 1 year ago

Just to sum it up, here are the variations:

One type, one method:

type OnceValue[T any] struct { ... }
func (*OnceValue[T]) Do(func() (T, error)) (T , error)

Two types, one method:

type OnceValue[T any] struct { ... }
func (*OnceValue[T]) Do(func() T) T 

type OnceValueErr[T any] struct { ... }
func (*OnceValueErr[T]) Do(func() (T, error)) (T, error)

One type, two methods:

type OnceValue[T any] struct { ... }
func (*OnceValue[T]) Do(func() T) T 
func (*OnceValue[T]) DoErr(func() (T, error)) (T, error)

cespare commented 1 year ago

The one type, two methods version seems fine to me. To prevent misuse Do could panic if DoErr was previously called.

rsc commented 1 year ago

One type, two methods seems out-of-place, because you have to call one of the methods consistently to use it correctly. If there are goroutines racing (this is package sync) and one calls Do and the other calls DoErr, then in at least one case we have a problem: when DoErr wins, caches an error, and then Do runs and can't return the error. That suggests that any valid use should either always call Do or always call DoErr. To avoid latent bugs that only show up in production, the implementation should probably panic any time it observes both methods being used.

So really there are two kinds of OnceValue: the kind that can only use Do, and the kind that can only use DoErr. Giving them the same Go type means the compiler can't help you make sure you are using the type correctly. In contrast, what we usually do in Go is use different types for different kinds of values, and then the type system and the compiler do help you, and this possible runtime panic is eliminated at compile time.

I think that excludes "one type, two methods".

"One type, one method" seems not quite right, because sometimes we will be caching things that can't possibly fail, and it's annoying to have to discard the error that can't happen anyway. Yes, code that can fail is common, but so is code that can't fail.

That leaves "two types, one method", which is why I suggested OnceValue and OnceValueErr.

ChrisHines commented 1 year ago

Adding to what @rsc says in the comment just above--which I agree with--we do sometimes want to standardize on a method signature with an error return that may always be nil in some situations. That is often the right choice when we expect or know it will be common to define an interface for that method and we expect some implementations to need to return errors even if not all implementations will need to. io.Writer is a good example. The Write method returns an error, but there are implementations in the standard library and elsewhere that we know always return nil errors (e.g. *bytes.Buffer).

Is there enough value in both types of OnceValue implementing the same interface to make it worth the cost of sometimes ignoring errors (and making sure that's the correct choice) or sometimes checking for errors that will never happen?

My intuition in this case is that, no, there isn't much value in implementing a common interface here.

adg commented 1 year ago

I implemented Russ' OnceValueErr type and updated the original example to use it:

type Server struct {
    dbPath string
    dbOnce sync.OnceValueErr[*sql.DB]
}

func NewServer(dbPath string) *Server {
    return &Server{
        dbPath: dbPath,
    }
}

func (s *Server) db() (*sql.DB, error) {
    return s.dbOnce.Do(func() (*sql.DB, error) {
        return sql.Open("sqlite", s.dbPath)
    })
}

func (s *Server) DoSomething() error {
    db, err := s.db()
    if err != nil {
        return err
    }
    _ = db // do something with db
}

(playground)

Compared to the original proposal of a OnceFunc function that returns a function value:

type Server struct {
    db func() (*sql.DB, error)
}

func NewServer(dbPath string) *Server {
    return &Server{
        db: sync.OnceFunc(func() (*sql.DB, error) {
            return sql.Open("sqlite", dbPath)
        }),
    }
}

func (s *Server) DoSomething() error {
    db, err := s.db()
    if err != nil {
        return err
    }
    _ = db // do something with db
    return nil
}

I think that the proposed OnceValueErr type has several disadvantages over my proposed OnceFunc:

OnceValueErr requires the use of another method through which you call its Do method. With OnceFunc there's just a db function value, and that's it.
OnceValueErr requires you to specify the type T in both the OnceValueErr type spec and also in the function passed to Do. OnceFunc infers the type from the given closure.
OnceValueErr is harder than OnceFunc to mock out in tests. With OnceFunc you can just substitute a different function value. With OnceValueErr you'd need to further abstract away the invocation of the Do method.

We use the io.Reader interface so that we can easily compose code that implements or uses io.Reader, without those different pieces of code needing to know about one another. The use of this well-defined interface is a great simplifying force. With OnceFunc I am proposing that we use the most well-defined interface in Go: the function.

In short: the purpose of my proposed OnceFunc is ergonomics. I see little benefit to the proposed OnceValue/OnceValueErr over the existing sync.Once.

icholy commented 1 year ago

OnceValueErr requires the use of another method through which you call its Do method. With OnceFunc there's just a db function value, and that's it.

If you were writing this code without the "once" behavior, a db method would be the idiomatic approach.

type Server struct {
    dbPath string
}

func NewServer(dbPath string) *Server {
    return &Server{
        dbPath: dbPath,
    }
}

func (s *Server) db() (*sql.DB, error) {
    // TODO: reuse connection
    return sql.Open("sqlite", s.dbPath)
}

func (s *Server) DoSomething() error {
    db, err := s.db()
    if err != nil {
        return err
    }
    _ = db // do something with db
}

rsc commented 1 year ago

I think we have established that we should probably support both T and (T, error) results, and that those should be different APIs, not a single one that appears to support both but only supports one at a time.

As @adg points out, his issue description was about a closure-based API, but the conversation shifted almost immediately (in the very first comment) to an object-based API. I didn't notice the shift when I started commenting, so we haven't discussed whether the API should be closure-based or object-based. My apologies for completely missing that change and not making sure we discussed that part of the proposal. Let's take a look at that dimension of the decision next.

I have a preliminary catalog of all the sync.Once uses in the main repo and will do more analysis and post the results in the morning.

rsc commented 1 year ago

There are 157 sync.Once declarations in the main repo, and there are a few different patterns that uses can be grouped into.

Pattern 1: Side Effects (25 of 157, 16%)

The simplest use of sync.Once is to cause a side effect at most once, no matter how many times the code runs.

For example, each side of an io.Pipe can be closed for reading or writing, but either way I/O is over, at that point. The implementation signals this to goroutines blocked in select by closing the p.done channel. Of course, a channel must only be closed once, so the code uses a sync.Once:

type pipe struct {
    ...
    once sync.Once
    done chan struct{}
    ...
}

func Pipe() (*PipeReader, *PipeWriter) {
    p := &pipe{
        wrCh: make(chan []byte),
        rdCh: make(chan int),
        done: make(chan struct{}),
    }
    return &PipeReader{p}, &PipeWriter{p}
}

func (p *pipe) closeRead(err error) error {
    ...
    p.once.Do(func() { close(p.done) })
    ...
}

func (p *pipe) closeWrite(err error) error {
    ...
    p.once.Do(func() { close(p.done) })
    ...
}

If there were a OnceFunc0 (which we haven't discussed, but let's just try it), this code would change to:

type pipe struct {
    ...
    cancel func()
    done chan struct{}
    ...
}

func Pipe() (*PipeReader, *PipeWriter) {
    p := &pipe{
        wrCh: make(chan []byte),
        rdCh: make(chan int),
        done: make(chan struct{}),
    }
    p.cancel = sync.OnceFunc0(func() { close(p.done) })
    return &PipeReader{p}, &PipeWriter{p}
}

func (p *pipe) closeRead(err error) error {
    ...
    p.cancel()
    ...
}

func (p *pipe) closeWrite(err error) error {
    ...
    p.cancel()
    ...
}

Some of the cleanup here would have been possible in the original by defining:

func (p *pipe) cancel() {
    p.once.Do(func() { close(p.done) })
}

instead of repeating that phrase in closeRead and closeWrite. So the fundamental difference between the cleaned-up original and the OnceFunc0 version is that the sync.Once itself and its cached data are not exposed: they are hidden inside the func.

We might wonder about what a hypothetical OnceValue0 would look like, but that's just sync.Once.

Pattern 2: Separate Initialization and Use (39 of 157, 25%)

Another common pattern is declaring a sync.Once next to the data it protects and then requiring users of that data to call the init function before using the data.

For example, compress/flate needs a huffman decoding table that we want to compute at runtime, to keep binary sizes down, but we also don't want to compute it at init time, to keep startup latency down. The code looks like:

var fixedOnce sync.Once
var fixedHuffmanDecoder huffmanDecoder

func fixedHuffmanDecoderInit() {
    fixedOnce.Do(func() {
        // These come from the RFC section 3.2.6.
        var bits [288]int
        for i := 0; i < 144; i++ {
            bits[i] = 8
        }
        for i := 144; i < 256; i++ {
            bits[i] = 9
        }
        for i := 256; i < 280; i++ {
            bits[i] = 7
        }
        for i := 280; i < 288; i++ {
            bits[i] = 8
        }
        fixedHuffmanDecoder.init(bits[:])
    })
}

func NewReader(r io.Reader) io.ReadCloser {
    fixedHuffmanDecoderInit()
    ...
}

func (f *decompressor) nextBlock() {
    ...
    if ... {
        // compressed, fixed Huffman tables
        f.hl = &fixedHuffmanDecoder
    }
    ...
}

If there were a OnceFunc1, this code could have been written instead like:

var fixedHuffman = sync.OnceFunc1(newFixedHuffmanDecoder)

func newFixedHuffmanDecoder() *huffmanDecoder {
    return ...
}

func NewReader(r io.Reader) io.ReadCloser {
    // DELETED: fixedHuffmanDecoderInit()
    ...
}

func (f *decompressor) nextBlock() {
    ...
    if ... {
        // compressed, fixed Huffman tables
        f.hl = fixedHuffman()
    }
    ...
}

Again the fundamental difference in the OnceFunc1 version is that the sync.Once and its cached data are not exposed. The original separated the one-time initialization from the use, which might lead to bugs where the data is accessed without the initialization step. In contrast, the OnceFunc1 makes those kinds of bugs impossible.

Some of the cleanup forced by OnceFunc1 is possible in the original by declaring:

func fixedHuffman() *huffmanDecoder {
    fixedHuffmanDecoderInit()
    return &fixedHuffmanDecoder
}

and then making sure code does not use fixedHuffmanDecoderInit or fixedHuffmanDecoder otherwise.

If we used a OnceValue instead, we'd start with the original and replace

var fixedOnce sync.Once
var fixedHuffmanDecoder huffmanDecoder

with

var fixedHuffmanDecoder sync.OnceValue[*huffmanDecoder]

and

f.hl = &fixedHuffmanDecoder

with

f.hl = fixedHuffmanDecoder.Do(newFixedHuffmanDecoder)

or maybe we would introduce

func fixedHuffman() *huffmanDecoder {
    return fixedHuffmanDecoder.Do(newFixedHuffmanDecoder)
}

and use

f.hl = fixedHuffman()

again.

Note that callers still need to know what to pass to Do, or we have to introduce a wrapper function that encapsulates that detail.

The most important observation seems to be that sync.Once permits separate initialization and use while OnceFunc would force callers not to do that, and in general this seems to clean up the code.

The same pattern also happens where the sync.Once and the cached data are fields in a struct.

Pattern 3: Encapsulated Data (93/157, 59%)

The final common pattern is a variant of the previous one, where the code already has the wrappers that hide the sync.Once from calling code.

For example, internal/testenv has:

var (
    gorootOnce sync.Once
    gorootPath string
    gorootErr  error
)

func findGOROOT() (string, error) {
    gorootOnce.Do(func() {
        ... set gorootPath, gorootErr
    })
    return gorootPath, gorootErr
}

All code calls findGOROOT. No code is expected to use the global variables: they are essentially private to findGOROOT.

Again the same pattern also happens where the sync.Once and the cached data are fields in a struct.

If we used OnceValueError, this would become:

var gorootOnce sync.OnceValueError[string]

func findGOROOT() (string, error) {
    return gorootOnce.Do(func() (string, error) {
        ...
    })
}

If we used OnceFunc2, this would become:

var findGOROOT = sync.OnceFunc2(func() (string, error) {
    ...
})

The code we started with was fairly clean. The only possible complaint is that gorootOnce, gorootPath, and gorootErr are exposed and could be misused.

The OnceValueError version hides gorootPath and gorootErr but leaves gorootOnce. The OnceFunc2 version hides gorootOnce too.

The OnceFunc2 version does have the downside that there is no name for the function in the stack trace if it crashes. It might be better for debuggability to adopt an idiom like:

var findGOROOT = sync.OnceFunc2(findGOROOTUncached)

func findGOROOTUncached() (string, error) {
    ...
}

But instead of forcing such changes on users, we can also adjust the func closure name heuristics to put findGOROOT into the name in the anonymous example.

In a struct with methods, the declaration would be a little different. For example, internal/lazyregexp has:

type Regexp struct {
    str  string
    once sync.Once
    rx   *regexp.Regexp
}

func (r *Regexp) re() *regexp.Regexp {
    r.once.Do(r.build)
    return r.rx
}

func (r *Regexp) build() {
    r.rx = regexp.MustCompile(r.str)
    r.str = ""
}

func New(str string) *Regexp {
    return &Regexp{str: str}
}

func (r *Regexp) MatchString(s string) bool {
    return r.re().MatchString(s)
}

The OnceValue version would be:

type Regexp struct {
    str  string
    rx sync.OnceValue[*regexp.Regexp]
}

func (r *Regexp) re() *regexp.Regexp {
    return r.rx.Do(r.build)
}

func (r *Regexp) build() *regexp.Regexp {
    s := r.str
    r.str = ""
    return regexp.MustCompile(s)
}

func New(str string) *Regexp {
    return &Regexp{str: str}
}

func (r *Regexp) MatchString(s string) bool {
    return r.re().MatchString(s)
}

And the OnceFunc version would be:

type Regexp struct {
    re func() *regexp.Regexp
}

func New(str string) *Regexp {
    return &Regexp{
        re: sync.OnceFunc(func() *regexp.Regexp {
            return regexp.MustCompile(str)
        })
    }
}

func (r *Regexp) MatchString(s string) bool {
    return r.re().MatchString(s)
}

Unusual Uses

There are a few unusual uses of sync.Once that are at least worth noting.

x/net/http2 has code like this:

type http2clientStream {
    ...
    abortOnce sync.Once
    abort     chan struct{} // closed to signal stream should end immediately
    abortErr  error         // set if abort is closed
    ...
}

func (cs *http2clientStream) abortStreamLocked(err error) {
    cs.abortOnce.Do(func() {
        cs.abortErr = err
        close(cs.abort)
    })
    ...
}

func (cc *http2ClientConn) RoundTrip(req *Request) (*Response, error) {
    ...
    cs := &http2clientStream{
        ...
        abort:                make(chan struct{}),
        ...
    }
    ...
}

This code is using the abortOnce for the side effect of closing cs.abort, like in the pipe example, but it is also saving the error that caused the close. I don't see an obvious way to change this code to use OnceFunc, because there's no way to pass the err into the func the first time it is called. The pipe code had the same problem but used a separate write-once abstraction to deal with the store of the error.

I suppose if we had a sync.WriteOnce[T] then the code could be written as:

type http2clientStream {
    ...
    abortClose func()
    abort     <-chan struct{} // closed to signal stream should end immediately
    abortErr  sync.WriteOnce[error]         // set if abort is closed
    ...
}

func (cs *http2clientStream) abortStreamLocked(err error) {
    cs.abortErr.Store(err)
    cs.abortClose()
    ...
}

func (cc *http2ClientConn) RoundTrip(req *Request) (*Response, error) {
    ...
    abort := make(chan struct{})
    cs := &http2clientStream{
        ...
        abort:                abort,
        abortClose: sync.OnceFunc0(func() {close(abort)}),
        ...
    }
    ...
}

As another example, cmd/go/internal/script has:

func Program(name string, cancel func(*exec.Cmd) error, waitDelay time.Duration) Cmd {
    var (
        ...
        lookPathOnce sync.Once
        path         string
        pathErr      error
    )
    if filepath.IsAbs(name) {
        lookPathOnce.Do(func() { path = filepath.Clean(name) })
        ...
    }

    return Command(
        ...,
        func(s *State, args ...string) (WaitFunc, error) {
            lookPathOnce.Do(func() {
                path, pathErr = exec.LookPath(name)
            })
            if pathErr != nil {
                return nil, pathErr
            }
            return startCommand(..., path, ...)
        })
}

This code is calling lookPathOnce.Do in two different places, with two different functions, depending on the form of the name passed to Program. The conditional call that happens when filepath.IsAbs(name) is true disables the "normal" call below.

This is a bit hard to reason about, and perhaps it would be better to write the code in a more conventional way, like this:

func Program(name string, cancel func(*exec.Cmd) error, waitDelay time.Duration) Cmd {
    var (
        ...
        lookPathOnce sync.Once
        path         string
        pathErr      error
    )
    if filepath.IsAbs(name) {
        path = filepath.Clean(name)
        ...
    }

    return Command(
        ...,
        func(s *State, args ...string) (WaitFunc, error) {
            lookPathOnce.Do(func() {
                if path == "" {
                    path, pathErr = exec.LookPath(name)
                }
            })
            if pathErr != nil {
                return nil, pathErr
            }
            return startCommand(..., path, ...)
        })
}

With OnceFunc2, the path and pathErr variables would be hidden, but the code could use different functions in the different cases:

func Program(name string, cancel func(*exec.Cmd) error, waitDelay time.Duration) Cmd {
    var lookPath func() (string, error)

    if filepath.IsAbs(name) {
        path := filepath.Clean(name)
        lookPath = func() (string, error) { return path, nil }
    } else {
        lookPath = sync.OnceFunc2(func() (string, error) { return exec.LookPath(name) })
    }

    return Command(
        ...,
        func(s *State, args ...string) (WaitFunc, error) {
            path, err := lookPath()
            if err != nil {
                return nil, err
            }
            return startCommand(..., path, ...)
        })
}

I'll post my thoughts about all these in the next comment. This comment is scoped to just presenting the data I gathered.

rsc commented 1 year ago

When I reread @adg's top comment and started thinking about the closure-based API, I was fairly skeptical. I'm a bit uncomfortable with the type of this functionality being a plain func value instead of a thing with a name. And storing what amount to methods as plain struct fields feels very JavaScripty. So I really wasn't expecting much.

Going through all the uses of sync.Once in the main repo, I was struck by how complex many of them are to reason about. The clearest code is the well-encapsulated uses (pattern 3), but not everyone knows to write the code that way. I think it even took us many years to develop that pattern. The encapsulated uses are mostly in newer code.

We should definitely do something here. 84% of the sync.Once uses are computing lazy values and would be better expressed with something more tailored. In the typical patterns, it seems to me that OnceFunc helps more than OnceValue does, and I think it makes sense to call it Lazy instead of OnceFunc, which I'll discuss more below. (I mention it now because the examples coming up are going to use Lazy.)

As noted in the previous comment, OnceValue hides the values but not the sync.Once. The best practice is still to wrap any use in a separate function or method that code calls to obtain the values. You have to discover that best practice, rather than writing v := x.v.Do(computeV) at each use.

Consider again this example from pattern 3:

var (
    gorootOnce sync.Once
    gorootPath string
    gorootErr  error
)

func findGOROOT() (string, error) {
    gorootOnce.Do(func() {
        ... set gorootPath, gorootErr
    })
    return gorootPath, gorootErr
}

If OnceValue is always used with the function wrapper pattern, then the uses end up essentially the same as Lazy's uses, except you have to write out the function wrapper each time, like this example from pattern 3:

var gorootOnce sync.OnceValueError[string]

func findGOROOT() (string, error) {
    return gorootOnce.Do(func() (string, error) {
        ...
    })
}

Lazy ends up codifying the pattern in a way that you can't avoid, ensuring clean uses without requiring everyone to learn and write the boilerplate:

var findGOROOT = sync.Lazy2(func() (string, error) {
    ...
})

If we're going to try to make uses of sync.Once shorter and less error-prone, it seems like OnceValue is only half a fix, while Lazy is the whole fix. So I'm inclined toward the function version.

OnceValue is a partial fix in a second way too: it only covers the 84% of sync.Once uses that compute a value. It doesn't cover the remaining 16% that don't compute a value, because OnceValue0 is just sync.Once. But those are still improved by using Lazy instead. For example compare:

type pipe struct {
    ...
    once sync.Once
    done chan struct{}
}

func (p *pipe) cancel() {
    p.once.Do(func() { close(p.done) })
}

func Pipe() (*PipeReader, *PipeWriter) {
    p := &pipe{
        wrCh: make(chan []byte),
        rdCh: make(chan int),
        done: make(chan struct{}),
    }
    return &PipeReader{p}, &PipeWriter{p}
}

with:

type pipe struct {
    ...
    cancel func()
    done <-chan struct{}
}

func Pipe() (*PipeReader, *PipeWriter) {
    done := make(chan struct{})
    p := &pipe{
        wrCh: make(chan []byte),
        rdCh: make(chan int),
        cancel: sync.Lazy(func() { close(done) }),
        done: done,
    }
    return &PipeReader{p}, &PipeWriter{p}
}

The Lazy version is shorter and lets the struct field done change to be a <-chan, to prevent misuse. It seems strictly better than the version with sync.Once. So Lazy would let us clean up a larger fraction of sync.Once instances than OnceValue would.

Back in https://github.com/golang/go/issues/56102#issuecomment-1285943596 we said that Lazy wasn't a good name because it should be used for something that has already captured the code that runs. Indeed, Lazy would be a bad name for OnceValue, but it seems like a good name for the closure-based version:

func Lazy(f func()) func()
func Lazy1[T any](f func() T) func() T
func Lazy2[T1, T2 any](f func() (T1, T2)) func() (T1, T2)

Then we'd have code like:

p.cancel = sync.Lazy(func() { close(p.done) })

var fixedHuffman = sync.Lazy1(newFixedHuffmanDecoder)

var findGOROOT = sync.Lazy2(func() (string, error) {
    ...
})

return &Regexp{
    re: sync.Lazy1(func() *regexp.Regexp {
        return regexp.MustCompile(str)
    },
}

if filepath.IsAbs(name) {
    path := filepath.Clean(name)
    lookPath = func() (string, error) { return path, nil }
} else {
    lookPath = sync.Lazy2(func() (string, error) { return exec.LookPath(name) })
}

These look clear to me, and also far less bug-prone than what we're doing today.

So I'm in favor of taking the func path and using the name sync.Lazy. I think we can stop at 2 results (no Lazy3).

rsc commented 1 year ago

One final note: a few people have mentioned that objects with methods are more "idiomatic" in Go than functions, but we do from time to time learn better ways to do things. For example

if i := strings.Index(s, ":"); i >= 0 {
    k, v := s[:i], s[i+1:]
    ...
}

used to be idiomatic in Go, but now

if k, v, ok := strings.Cut(s, ":"); ok {
    ...
}

is instead. One could have argued against strings.Cut by saying that explicit indexing is idiomatic. Idioms evolve.

hherman1 commented 1 year ago

I find this analysis compelling, it seems like the func version is better. The name Lazy throws me a bit, as I usually expect Lazy to indicate a value T that will be computed when it’s needed. This is accurate for most uses, but not the 0 return use in Pipe.

DeedleFake commented 1 year ago

I notice that the issue of needing two variants of generic things, one for T and one for (T, error), keeps coming up in numerous places. Maybe it's time to reevaluate tuples again? If not, maybe it would make sense to put a type Result[T any] struct { Val T; Err error } type into the errors package just so that it can be standardized to be used by generic data structures and implementations?

I can write this up as a separate proposal if you want, but I wanted to mention it here because I feel like jumping straight to Lazy + Lazy2 might be a bit quick given that that problem seems, to me at least, to need solving more generally.

rsc commented 1 year ago

At this point I don't think it will help to file a proposal about tuples. It's too soon to be redesigning generics. We need to use them for longer first.

cespare commented 1 year ago

The OnceValueError version does have the downside that there is no name for the function in the stack trace if it crashes.

I believe that should be "OnceFunc2", not "OnceValueError".

ianlancetaylor commented 1 year ago

cancel: sync.Lazy(func() { close(done) }),

I'm sure I could get used to this, but at first glance it's kind of weird. Using sync.Once is clear: the function is run once. Here we have a function that should be run once, but we're calling it a lazy function. To me a lazy function is something that computes a value when that value is needed. But close doesn't compute a value at all. It's strange to call it lazily.

earthboundkid commented 1 year ago

Instead of sync.Lazy / sync.Lazy1 / sync.Lazy2, the names could be sync.OnceFunc / sync.OnceValue / sync.OncePair. It also helps discoverability as you type sync.Once in your IDE and the Func/Value/Pair autocompletions pop up.

cespare commented 1 year ago

@rsc thanks for the analysis. Personally I don't find the examples compelling -- the OnceFunc versions are certainly clever, but they do not seem clearer or less prone to misuse to me. Yes, some internal state is hidden inside a closure, but that function value itself is now a var or struct field whereas previously it would have been an immutable function or method.

It might be better for debuggability to adopt an idiom like:
var findGOROOT = sync.OnceFunc2(findGOROOTUncached)

func findGOROOTUncached() (string, error) {
  ...
}

Well, if we did that, wouldn't it negate most of the claims about misuse resistance and boilerplate avoidance that OnceFunc has in the first place? (Because now you can call the wrong function -- the one you are supposed to call is not the func, but the var! -- and you are back to declaring one var and one func per usage.)

Also, whenever the OnceFunc argument is not a function literal, as in the above case or as in

var fixedHuffman = sync.OnceFunc1(newFixedHuffmanDecoder)

the type at play isn't mentioned at the call site. This is a nice property of

var fixedHuffmanDecoder sync.OnceValue[*huffmanDecoder]

rsc commented 1 year ago

Well, if we did that, ...

As I noted later, we can also fix the compiler's naming heuristic.

the type at play isn't mentioned at the call site.

I am assuming that

var fixedHuffman = sync.OnceFunc1(newFixedHuffmanDecoder)

would appear next to the definition of func newFixedHuffmanDecoder.

But if you really wanted to see the type on that line, you could write

var fixedHuffman = sync.OnceFunc1[*huffmanDecoder](newFixedHuffmanDecoder)

adg commented 1 year ago

Thanks for the thorough analysis @rsc! That is very helpful.

And I appreciate your observation, which I hadn't considered:

Lazy ends up codifying the pattern in a way that you can't avoid, ensuring clean uses without requiring everyone to learn and write the boilerplate:

I think it's easy for us experienced Go programmers to assume people know the patterns. Making the pattern "use this function" greatly simplifies things.

I also appreciate the suggestion of the OnceFunc that returns a closure that doesn't return any values. That's a nice touch.

adg commented 1 year ago

WRT naming I share the concerns raised by @hherman1 and @ianlancetaylor. I think the name Once is more precise and less overloaded than Lazy. I propose a variation on @carlmjohnson's suggestion:

func OnceFunc(func()) func()
func OnceValue[T any](func() T) func() T
func OnceValues[T1, T2 any](func() (T1, T2)) func() (T1, T2)

hherman1 commented 1 year ago

Why are we making the pair version generic over both return values? It begs the question why stop at 2, and not have a three return variant. Whereas I think OnceErr makes it more clear why we stopped at two.

adg commented 1 year ago

@hherman1 I can think of two reasons to use T2 instead of error:

While Go functions that return (T, error) are among the most common, the other very common return value pairing is (T, bool). Supporting an arbitrary type makes this possible.
Because we can - it's easy to make it an arbitrary type, so why not?

hherman1 commented 1 year ago

I forgot about the bool variant, ok I’m convinced. But I wish there were a better name than OnceValues 🤔 Values plural sounds like a slice not a pair… but I can’t think of any.

ChrisHines commented 1 year ago

WRT naming I share the concerns raised by @hherman1 and @ianlancetaylor. I think the name Once is more precise and less overloaded than Lazy. I propose a variation on @carlmjohnson's suggestion:
func OnceFunc(func()) func()
func OnceValue[T any](func() T) func() T
func OnceValues[T1, T2 any](func() (T1, T2)) func() (T1, T2)

Choosing OnceValues for the third choice doesn't leave much room for other plural forms and is also pretty subtle. We might want to pick a name for the two value function that has a more natural progression to higher counts if we need it. Even if we don't expect to add it to the standard library, some people may need it locally and leaving them a choice of names that fit well with the standard library names might be nice.

rsc commented 1 year ago

I kind of like OnceValues precisely because it closes the door. But OnceFunc, OnceValue, OnceValue2 sound fine too.

DeedleFake commented 1 year ago

I doubt it's possible to find names that everyone will like. I'm personally partial to OnceFunc, Lazy, and Lazy2, with comments referencing the others from each's documentation, but I'd much prefer finding a way to not need Lazy2 at all.

qingtao commented 1 year ago

func OnceFunc[T any](fn func() (T, error)) func() (T, error)

I just want it.

rsc commented 1 year ago

After much discussion, it sounds like people are generally happy with:

func OnceFunc(f func()) func()
func OnceValue[T any](f func() T) func() T
func OnceValue2[T1, T2 any](f func() (T1, T2)) func() (T1, T2)

Do I have that right?

earthboundkid commented 1 year ago

I am fine with that. I think it's worth letting the bikeshed debate go on slightly longer than usual because what we do in the standard library will set a standard that packages outside the standard library will follow also… But the -2 convention is probably as good as any.

icholy commented 1 year ago

The ratio of API surface area to value provided seems off. Perhaps I'm underestimating how often this will get used.

golang / go