golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.86k stars 17.52k forks source link

weak: new package providing weak pointers #67552

Open mknyszek opened 3 months ago

mknyszek commented 3 months ago

What is a weak pointer?

Weak pointers (or weak references, as they are referred to in other languages) allow developers to reference memory without preventing the garbage collector from reclaiming it. To prevent visible dangling references, weak pointers become nil when the referenced memory is collected. Weak pointers can be converted to regular ("strong") pointers which do prevent the garbage collector from reclaiming memory, and allow typical use and access to that memory.

Weak pointers are typically a bit harder to work with than regular pointers because they can become nil at any time. Virtually every weak-to-strong conversion must be guarded by a nil check. Often, weak pointers will become nil at unexpected times.

Nonetheless, weak pointers exist in many languages because they are useful. The main use-cases for weak pointers are related to efficient memory management and reclamation. For example, efficiently managing memory for canonicalization maps, or for memory whose lifetime is bound to the lifetime of another object (akin to JavaScript's WeakMap). Another good use case for weak pointers is for hinting to the GC that it's OK to drop some resources because they're cheap to reconstruct later, especially if those resources have a large memory footprint.

Proposal

I propose adding a new weak package to the standard library with the following API.

// Pointer is a weak pointer to a value of type T.
//
// Two Pointer values compare equal if the pointers
// that they were created from compare equal. This property is retained even
// after the object referenced by the pointer used to create a weak reference
// is reclaimed.
//
// If multiple weak pointers are made to different offsets within same object
// (for example, pointers to different fields of the same struct), those pointers
// will not compare equal.
// If a weak pointer is created from an object that becomes unreachable, but is
// then resurrected due to a finalizer, that weak pointer will not compare equal
// with weak pointers created after resurrection.
//
// Calling Make with a nil pointer returns a weak pointer whose Value method
// always returns nil. The zero value of a Pointer behaves as if it was created
// by passing nil to Make and compares equal with such pointers.
type Pointer[T any] struct { ... }

// Make creates a weak pointer from a pointer to a value of type T.
func Make[T any](ptr *T) Pointer[T] { ... }

// Value returns the original pointer used to create the weak pointer.
// It returns nil if the value pointed to by the original pointer was reclaimed by
// the garbage collector.
// If a weak pointer points to an object with a finalizer, then Value will
// return nil as soon as the object's finalizer is queued for execution.
func (p Pointer[T]) Value() *T { ... }

Design discussion

Representation

The weak pointer we propose would be represented via an indirection object under the hood. This means that each pointer that has any weak pointers associated with it also has an associated 8 byte allocation that contains a single pointer. This pointer is the actual weak reference. This representation is very CPU efficient, since the GC can atomically set a single pointer to make all weak references to an object nil.

This 8 byte allocation is rarely a problem in practice since weak pointers are most often useful for memory efficiency anyway. For example, canonicalization maps are deduplicating memory already, so the extra 8 bytes per canonical copy is relatively cheap. However, it may be worth calling out this detail in the documentation as the C# language does.

This representation has other benefits. One benefit is that the API's equality semantics fall out very naturally. Weak pointers created from the same pointer remain equal even after that pointer no longer exists. This makes it possible to build maps keyed by weak pointers. Another benefit is its simplicity: very little of the GC has to change to support weak pointers.

Avoiding object resurrection

A subtle but important property of the weak pointer implementation is that it must not enable object resurrection. This would create many complications, including the fact that weakly-referenced objects would take 2 GC cycles to reclaim; and they could not be involved in cycles, or they would never be reclaimed (much like finalizers).

A simple way to implement weak references that maintains this property is to guard weak-to-strong conversions with ensuring the object is swept. A swept object cannot be resurrected, because it is always definitely reachable or unreachable, nothing in between. This means the weak-to-strong conversion path may need to sweep the object, but this happens at most once per GC cycle, and the Go runtime's sweeper is quite fast, so the cost shouldn't be noticeable in practice.

Why this specific interaction with finalizers?

The interaction between weak pointers and finalizers that resurrect objects is subtle. There are two choices: the weak pointer can go nil before the finalizer runs, or it can track the object's resurrection, and only go nil once the object is well and truly dead. In C# jargon, this is the difference between "short" weak reference and a "long" weak reference.

At first glance, the latter seems clearer since the weak pointer really only goes nil once nothing can reference the object. However, this semantics has a big problem, and it's subtle.

Today, the decision to resurrect an object in a finalizer is internal to an API’s implementation. If we allowed weak-to-strong conversions after finalization, then clients of an API could resurrect an object without the cooperation of its implementation, at which point the object's internals may be in an inconsistent state. This creates the potential for a race that is not immediately apparent to either the API authors or the client. Worse still, these races will be very rare and unlikely to be caught by testing, even with the race detector enabled. If the weak pointer goes nil before the finalizer runs, then such races are impossible.

It is telling that other garbage collected languages either make "short" weak references the default and do not recommend their "long" form (C#), or only offer the "short" form to begin with (Java).

Risks

Breaking the GC abstraction

The unique.Handle proposal explicitly rejected weak pointers as an alternative on the grounds that they are difficult to use. Fundamentally, this is because they break the GC abstraction by revealing the timing with which memory is actually reclaimed, and this is mainly a problem because that timing is unpredictable. The GC abstraction of "infinite memory" (only allocation, no apparent free) is important in general for composability.

While it is true that weak pointers are more difficult to use than regular pointers, we believe that we have picked an API (building on the experiences of other languages) that minimizes surprises, maintains composability, and has a simple implementation. This was discovered during the implementation of unique.Handle and it was compelling enough that it led to this proposal.

Although weak pointers will be misused in some cases, providing documentation, examples, and advice in the GC guide will go a long way to mitigating that.

thediveo commented 3 months ago

glad this is being resurrected *scnr*

cespare commented 3 months ago

How does this interact with the unique package (#62483)? Isn't it possible for a user to implement unique (or at least solve the same problems that unique solves) using weak pointers? If we add weak, will we regret having both weak and unique?

mknyszek commented 3 months ago

@cespare I don't think we'll regret having both. The main reason to have unique in std is that canonicalization maps benefit from being as global as possible, both for performance (bigger potential for sharing data) and composability (handles are always comparable, regardless of their origin). That alone makes it worth it, I think.

A secondary reason is that it's not actually possible to safely implement the concurrent map underlying unique outside of the standard library today. #54670 (maphash.Comparable) would change that. While I think maphash.Comparable is still important for the ecosystem, it's not going to be easy to make that as efficient as the hashing the std maps use. (See the issue for a breakdown of the problem by Austin.)

Weak pointers are still useful for lots of other cases where globalness is less useful and maybe even not what you want at all. For example, implementing weak-keyed maps, which are generally used for attaching data to memory you don't own the lifecycle of. There's also plenty more reasonable ad-hoc use-cases that don't neatly fall into something std would support out of the box.

Merovius commented 3 months ago

What does Make((*T)(nil)) do? Useless and the answer is probably "return a Pointer that always returns nil", but just to make sure.

mknyszek commented 3 months ago

@Merovius Yep, that's right. Thanks! Updated the proposal.

apparentlymart commented 3 months ago

Although a weak.Pointer that is immediately nil is a little strange, I've written programs with similar features in other languages where it's been helpful to represent an "optional weak pointer" without adding another level of indirection. I probably wouldn't complain too much if it weren't possible, but it has enough marginal use that I don't think it's worth overcomplicating things to forbid it.

(For example, a Rust analog would be std::rc::Weak::new(), which returns a Weak where upgrade always returns None, which is directly analogous to a weak.Pointer where Value always returns nil.)

Is the zero value of weak.Pointer[T] equivalent to what you'd get from passing nil to weak.Make?

mknyszek commented 3 months ago

Is the zero value of weak.Pointer[T] equivalent to what you'd get from passing nil to weak.Make?

From my perspective, I believe that's what the semantics would have to imply. In the implementation, yes, I'd probably just have it return the zero value of the weak pointer.

EDIT: Oh, it occurs to me that passing nil for a specific type could have a different value than the zero value. (Some per-type sentinel value.) I don't think we should do that. I think the zero value of a weak.Pointer[T] should compare equal with the result of weak.Make[T](nil). Updated the proposal.

LukeXeon commented 3 months ago

https://github.com/golang/go/blob/377646589d5fb0224014683e0d1f1db35e60c3ac/src/internal/weak/pointer.go#L51 I see "weak pointer" under the "internal" package. Is there any progress on this proposal currently?

mknyszek commented 3 months ago

@LukeXeon This proposal is awaiting review. If accepted, the internal/weak package already contains an implementation that mostly implements the semantics laid out here. It could just be moved out of internal with a few minor tweaks. Note that, if the proposal is accepted, this would land in Go 1.24, not the upcoming Go 1.23 (it's already too late for Go 1.23).

eihigh commented 3 months ago

Unlike server programs that allocate and release memory for each request and persist it in the DB, a weak pointer is essential for client programs that hold multiple containers or partially release items. I feel that Go lacks support for such programs.

(As an aside, #64777 is another example of a feature needed for client programs)

rsc commented 2 months ago

In our discussion yesterday, @bradfitz asked if weak pointers made it possible to maintain a map of expensive derived information about something without preventing the GC of that something. Here is a worked example of how weak pointer enables that:

var cache struct {
    mu sync.Mutex
    m map[weak.Pointer[Foo]]*derived
}

func init() {
    cache.m = make(map[weak.Pointer[Foo]]*derived)
}

func cached(f *Foo) *derived {
    cache.mu.Lock()
    defer cache.mu.Unlock()

    p := weak.Make(f)
    d := cache.m[p]
    if d != nil {
        return d
    }
    d = expensiveComputation(f)
    cache.m[p] = d
    runtime.AddCleanup(f, deleteP, p)
}

func deleteP(p weak.Pointer[Foo]) {
    cache.mu.Lock()
    defer cache.mu.Unlock()

    delete(cache.m, p)
}

I am assuming runtime.AddCleanup from #67535, which is guaranteed to work with any pointer f. (SetFinalizer might break things if f already had a finalizer, or if f was inside another object.)

thediveo commented 2 months ago

@rsc can this please find its way into the official documentation or alternatively the Tour of Go? Such things really need to be easily accessible, and not buried only in an issue comment.

rsc commented 2 months ago

I think it would make sense for this to be an example in the weak package, if the proposal is accepted. Putting it in the Go tour seems TMI.

thediveo commented 2 months ago

Yeah, it could otherwise become a "Tor-Tour of Go" (sorry, couldn't resist)

rsc commented 2 months ago

Following up on my comment from last week, here is a potential weak cache abstraction that avoids holding the lock during expensiveComputation and also wraps up the idioms nicely so you don't have to retype them. I am not proposing to add it to the package (others can argue for that if they want) but it's here as an example, and we can include it as an example in the package too.

type Cache[K any, V any] struct {
    f func(*K) V
    m atomic.Map[weak.Pointer[K], func() V]
}

func NewCache[K comparable, V any](f func(*K)V) *Cache[K, V] {
    return &Cache[K, V]{f: f}
}

func (c *Cache[K, V]) Get(k *K) V {
    kw := weak.Make(k)
    vf, ok := c.m.Load(kw)
    if ok {
        return vf()
    }
    vf = sync.OnceValue(func() V { return c.f(k) })
    vf, loaded := c.m.LoadOrStore(kw)
    if !loaded {
        // Stored kw→vf to c.m; add the cleanup.
        runtime.AddCleanup(k, c.cleanup, kw)
    }
    return vf()
}

func (c *Cache[K, V]) cleanup(kw weak.Pointer[K]) {
    c.m.Delete(kw)
}

var cached = NewCache(expensiveComputation)
DmitriyMV commented 2 months ago

@rsc

Since the cache doesn't do weak -> strong pointer, and with current GC this should also work, no?

type Cache[K any, V any] struct {
    f func(*K) V
    m atomic.Map[uintptr, func() V]
}

func NewCache[K comparable, V any](f func(*K)V) *Cache[K, V] {
    return &Cache[K, V]{f: f}
}

func (c *Cache[K, V]) Get(k *K) V {
    kw := uintptr(unsafe.Pointer((k))
    vf, ok := c.m.Load(kw)
    if ok {
        return vf()
    }
    vf = sync.OnceValue(func() V { return c.f(k) })
    vf, loaded := c.m.LoadOrStore(kw)
    if !loaded {
        // Stored kw→vf to c.m; add the cleanup.
        runtime.AddCleanup(k, c.cleanup, kw)
    }
    return vf()
}

func (c *Cache[K, V]) cleanup(kw uintptr) {
    c.m.Delete(kw)
}

var cached = NewCache(expensiveComputation)
rsc commented 2 months ago

@DmitriyMV There is a race in that code. It could theoretically happen that the GC identifies k as garbage, puts it back on a free list for future allocation, queues the cleanup, and then the program allocates a new k, gets the recycled address, and reuses the stale value because the cleanup has not gotten a chance to run yet. (It's also not guaranteed to work in an implementation that moves objects, but objects aren't moving today except on the stack, and k escapes to the heap so it won't be a stack object in the current implementation.)

rsc commented 1 month ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 1 month ago

Have all remaining concerns about this proposal been addressed?

The details are in the top comment under the "Proposal" heading.

ChrisHines commented 1 month ago

I am trying to figure out what the atomic.Map[weak.Pointer[K], func() V] refers to in @rsc's example in https://github.com/golang/go/issues/67552#issuecomment-2200755798.

Is that the sync.MapOf (or maybe it is sync/v2.Map now) proposed in https://github.com/golang/go/issues/47657 (which I think is on hold)?

While searching for that I also came across another related proposal I didn't see referenced here yet: https://github.com/golang/go/issues/43615.

Russ's example also uses runtime.AddCleanup from https://github.com/golang/go/issues/67535 which is not approved yet.

Hopefully these cross-refs are useful to someone, but also it would seem we cannot use that example in the weak package docs if the v2/sync.Map or AddCleanup are not available at the time.

Merovius commented 1 month ago

@ChrisHines You can't use the example as written, but you can use sync.Map with type-assertions instead. Or one of the third-party generic concurrency-safe map packages.

RLH commented 4 weeks ago

I once came to the conclusion that ephemorons ( https://dl.acm.org/doi/pdf/10.1145/263698.263733) were better than weak pointers and starting with .NET many other post Java languages have come to the same conclusion and implemented them (see https://en.wikipedia.org/wiki/Ephemeron for a list). Ephemerons solve some thorny key/value problems where reachability of key should be independent of the value structure thus allowing value to reference key without keeping key alive.

The GC trace implementation is 3 pass instead of the current 2 pass and complexity is O(ND) where N is the number of unreachable ephemerons and D is the depth of the structures needing to the traced from them in pass 3. I suspect if we do weak pointers there will be a request for ephemerons once users run into the problems.

I have not thought through the concurrency issues beyond noticing that Go would be the first to deal with such issues. Implementing iterators that can reference keys implicitly are another problem I haven't thought through.

In any case if Go doesn't implement ephemorons this issue should track the discussion..

On Thu, Aug 15, 2024 at 1:20 AM Axel Wagner @.***> wrote:

@ChrisHines https://github.com/ChrisHines You can't use the example as written, but you can use sync.Map with type-assertions instead.

— Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/67552#issuecomment-2290649565, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHNNH4F64GQSXLL2HH3FFDZRQ3AZAVCNFSM6AAAAABICGOSGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJQGY2DSNJWGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>

mknyszek commented 4 weeks ago

Hi Rick! I'm glad you brought up ephemerons -- they resolve a real issue when working with weak pairs (like in maps) and this proposal certainly should mention them somewhere. This issue will almost certainly come up in the future.

TL;DR: My gut feeling is we won't regret the weak pointers as described in this proposal.

And now, the long version...

In general, my main gripe with ephemerons is the value side. The question on my mind is "which value?" Having only one value for each possible key runs into composability problems like SetFinalizer, so I think that's a no-go.

.NET 4.0 introduced the ConditionalWeakTable which you have to use to specify which value you mean, but I think it ends up being insufficient for some use-cases (like the unique package; that clearly exists now, so I'm thinking any such class of problems). .NET 6.0 later made the underlying abstraction, DependentHandle, public. Maybe that's something we'd want? Some explicit handle representing the dependent association?

EDIT: A previous version of this comment contained an example that wasn't totally coherent. I've replaced it below with something much clearer.

Like, I think in the spirit of Russ' cache example, given both DependentHandle and ConditionalWeakTable, one might express the map behind such a cache as:

ConditionalWeakTable[K, func() DependentHandle[K, V]]

... in order to avoid any leak issues in case the value-generating closure or the cached value itself containing a strong reference to the key. (Which seems unlikely, but is of course still possible.) Since ConditionalWeakTable is implemented using DependentHandle, in theory DependentHandle is all you need to be provided by the runtime for this use-case.

I haven't fully worked out the details, but here's what I think DependentHandle would look like:

package weak

// DependentHandle is an association between a pointer to a value of type K, the key,
// and some value of type V, the value. The key pointer is treated as a weak pointer, and
// the value is only considered reachable if the key is reachable.
type DependentHandle[K, V] struct {
    // unexported fields
}
func NewDependentHandle[K, V](key *K, value V) DependentHandle[K, V]
func (d DependentHandle[K, V]) Key() *K
func (d DependentHandle[K, V]) Value() (V, bool)

DependentHandle is interesting, because I think you can define a weak pointer as:

package weak

type Pointer[T] struct {
    handle DependentHandle[T, *T]
}

func Make[T any](t *T) Pointer[T] {
    return Pointer[T]{NewDependentHandle[T, *T](t, t)}
}

func (p Pointer[T]) Value() *T {
    v, ok := p.handle.Value()
    if ok {
        return v
    }
    return nil
}

This is not quite a perfect mapping. What's lost is the equality semantics that this proposal has for weak pointers. In fact, there's a general problem here with dependent handles and maps: how does one define the key type? A simple map[DependentHandle[K, *K]]DependentHandle[K, V] doesn't quite work, because DependentHandle aren't forgeable in the same way that weak pointers are forgeable. You can build a custom map type utilizing DependentHandle, but that's a bit disappointing. If we defined both a weak.Pointer and weak.DependentHandle, then you could write map[weak.Pointer[K]]DependentHandle[K, V] and I believe the right semantics fall out nicely.

So... my gut feeling is that we won't regret weak pointers, and we may just end up with both. I think this state of things makes sense to me, but I'm happy to be proven wrong. The weak pointers as described in this proposal are fairly simple and straightforward (both in implementation and usage), and have clean semantics that fall out nicely from the API.

On a final note, the implementation complexity of ephemerons/DependentHandle is also a bit unfortunate, what with the traditional "extra GC phase" implementation strategy. I think it might be possible to simplify this by handling ephemerons while greying objects:

  1. Hold ephemeron values weakly via specials.
  2. When an object is greyed, cheaply check some metadata (we're already in the guts of the mspan at that point, so we could probably have an extra bit next to the mark bits or put a bit in the allocation header to avoid another cache miss).
  3. If there's at least one ephemeron, do a slower lookup for the ephemeron specials, iterate over them, and queue the weakly-held values with the GC.

Of course, the downside here is the extra check on the mark hot path. As I alluded to in (2) above, I suspect we can hide this cost, but not without some engineering work.

(As an additional tangential note, I think Java never got ephemerons, but I'm not sure why.)

mknyszek commented 4 weeks ago

I definitely got some things wrong in my previous comment earlier -- I've corrected them now. I've also worked out an example of how ephemerons might be exposed in Go, but there are some difficulties with composability and defining comparability on an ephemeron, such that it actually can be used in all the same ways. I think on a theoretical level you can write your code in most (maybe all?) use-cases to use an ephemeron instead, but it's more complex and may require you to reimplement a lot more than you bargained for (like, a new special map implementation).

So, I think there's still some utility to keeping weak pointers around, and I also don't think we should ignore the complexity (and possible performance difficulties) introduced by an ephemeron implementation. I'm also happy to hear other opinions.

RLH commented 3 weeks ago

Section 10 of this paper (https://dl.acm.org/doi/pdf/10.1145/3226225) is an interesting read. The work is based on the same Sapphire work that the Go GC is based on so things match up pretty well with the Go GC. It argues why deletion barriers and atomic phase changes work. Their CLEAR phase is similar to Go's sweep phase and thus atomic for similar reasons. I am a bit concerned about making a weak ref Get fast (inlined) and atomic.

Earlier in the paper there are some warnings about how common weak refs were in some of the DeCapo benchmarks. This made me concerned about sweep storms caused by weak reference niling. In addition nobody seems to have a good answer to clearing stale weak entries in large caches. Perhaps we will end up yet again talking about building some sort of runtime notification scheme.

WRT Ephermerons - I have no implementation but the big O and complexity of adding another GC phase feels like the cost / benefit will be too high and we have no user demand or requests.

None of this is intended to be negative about the proposal as is, I just wanted to get issues and literature on the record.

On Sat, Aug 17, 2024 at 3:33 AM Michael Knyszek @.***> wrote:

I definitely got some things wrong in my previous comment earlier -- I've corrected them now. I've also worked out an example of how ephemerons might be exposed in Go, but there are some difficulties with composability and defining comparability on an ephemeron, such that it actually can be used in all the same ways. I think on a practical level you can write your code in most (maybe all?) use-cases to use an ephemeron instead, but it's more complex and may require you to reimplement a lot more than you bargained for (like, a new special map implementation).

So, I think there's still some utility to keeping weak pointers around, and I also don't think we should ignore the complexity (and possible performance difficulties) introduced by an ephemeron implementation. I'm also happy to hear other opinions.

— Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/67552#issuecomment-2294746516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHNNH646LWVKABYUOIFHIDZR34EJAVCNFSM6AAAAABICGOSGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJUG42DMNJRGY . You are receiving this because you commented.Message ID: @.***>

RLH commented 2 weeks ago

The use of weak pointers and AddCleanup in the example above points out the big footgun of both SetFinalizer and AddCleanup. Both result in a user defined function being run asynchronously in undefined order by the GC. Perhaps AddNotification would be a better API. Instead of passing AddCleanup a func to call, AddNotification would be passed a channel. Instead of calling the function the GC would clear the referent and then put the weak pointer on the channel. The channel would connect to a user defined goroutine and do whatever is needed at a time and manner completely under user control. In the example above the user routine would read from the channel and delete the cache entry. Reducing non-determinism is usually a win.

On Fri, Aug 23, 2024 at 12:03 PM Rick Hudson @.***> wrote:

Section 10 of this paper (https://dl.acm.org/doi/pdf/10.1145/3226225) is an interesting read. The work is based on the same Sapphire work that the Go GC is based on so things match up pretty well with the Go GC. It argues why deletion barriers and atomic phase changes work. Their CLEAR phase is similar to Go's sweep phase and thus atomic for similar reasons. I am a bit concerned about making a weak ref Get fast (inlined) and atomic.

Earlier in the paper there are some warnings about how common weak refs were in some of the DeCapo benchmarks. This made me concerned about sweep storms caused by weak reference niling. In addition nobody seems to have a good answer to clearing stale weak entries in large caches. Perhaps we will end up yet again talking about building some sort of runtime notification scheme.

WRT Ephermerons - I have no implementation but the big O and complexity of adding another GC phase feels like the cost / benefit will be too high and we have no user demand or requests.

None of this is intended to be negative about the proposal as is, I just wanted to get issues and literature on the record.

On Sat, Aug 17, 2024 at 3:33 AM Michael Knyszek @.***> wrote:

I definitely got some things wrong in my previous comment earlier -- I've corrected them now. I've also worked out an example of how ephemerons might be exposed in Go, but there are some difficulties with composability and defining comparability on an ephemeron, such that it actually can be used in all the same ways. I think on a practical level you can write your code in most (maybe all?) use-cases to use an ephemeron instead, but it's more complex and may require you to reimplement a lot more than you bargained for (like, a new special map implementation).

So, I think there's still some utility to keeping weak pointers around, and I also don't think we should ignore the complexity (and possible performance difficulties) introduced by an ephemeron implementation. I'm also happy to hear other opinions.

— Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/67552#issuecomment-2294746516, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHNNH646LWVKABYUOIFHIDZR34EJAVCNFSM6AAAAABICGOSGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJUG42DMNJRGY . You are receiving this because you commented.Message ID: @.***>

mknyszek commented 2 weeks ago

We'd considered using a channel for AddCleanup, but channels have a few problems. The biggest one is that packages that are unaware of each other must, at minimum, spin up at least one goroutine per package to handle all the package's cleanups. That seems like too much overhead to incentivize moving away from SetFinalizer. Either that, or the community standardizes around some shared package that exposes an API similar to AddCleanup, in which case we've gained little by exposing a channel-based API.

I get your point about the hazards of an API that says "I'll run this function asynchronously in the future" in general, but executing a function asynchronously is always going to be something you can write yourself. And things like time.AfterFunc exist. I don't really see how that's much different from this case. It's not like the GC worker goroutines are executing the functions themselves, they're always executed on a separate goroutine.

FWIW, I think this part of the discussion would be better suited to the AddCleanup issue (#67535). I'll copy the context over.

rsc commented 2 weeks ago

Have all remaining concerns about this proposal been addressed?

The details are in the top comment under the "Proposal" heading.

valyala commented 2 weeks ago

It would be great analyzing top production use cases for weak pointers in other programming languages before deciding on how to implement them in Go.

Probably, it would be better providing higher-level solutions for these practical use cases in standard library instead of exposing lower-level weak pointers.

oneumyvakin commented 2 weeks ago

@mknyszek could you please add to your proposal 2-3 examples of practical problems could not be solved now without weak pointers?

rsc commented 1 week ago

Here's one example use: https://github.com/golang/go/issues/67552#issuecomment-2200755798.

rsc commented 1 week ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

The details are in the top comment under the "Proposal" heading.

valyala commented 1 week ago

Here's one example use: #67552 (comment).

I'm unaware of other practical cases for weak pointers. Probably, it is better providing generic thread-safe cache implementation inside sync package instead of exposing weak pointers, which will be used mostly for re-creating the same cache implementations in various codebases with various degree of subtle bugs and non-optimal performance.

The API of this cache may look like the following (feel free making it generic - I tried to do this, but the result wasn't good):

// Cache is goroutine-safe cache with automatic cleanup on memory shortage.
//
// TODO: define what does "memory shortage" mean.
// Ideally the cache shouldn't be cleared on every GC cycle
// if there is enough free memory and the memory usage is stable over time.
//
// It is OK to use zero cache.
type Cache struct {
  // Retention is optional retention for the cached items.
  // If it is set to positive value, then items without
  // access for the given duration are automatically deleted.
  //
  // Do not change this field after the first use of the cache.
  Retention time.Duration

  // Free is optional callback, which is called when the cached item is deleted.
  //
  // Do not change this field after the first use of the cache.
  Free func(k, v any)
}

// Get returns value for the given key k stored in c.
//
// Nil is returned if the cache doesn't contain value for the given key.
func (c *Cache) Get(k any) any

// Set stores the given value v under the given key k in  c.
func (c *Cache) Set(k, v any)

According to my practice the Retention field must be set to some positive value most of the time in order to keep the cache size under control while maintaining good cache hit rate when the set of cached items changes over time.

mknyszek commented 1 week ago

The proposal names three general use-cases:

  1. Canonicalization maps (where unique isn't quite what you want, or doesn't quite work).
  2. Binding the lifetime of one value to another (typically, map[weak.Pointer[K]]*V).
  3. Some kinds of caches (as Russ already gave an example of).

First off, let me say that weak pointers are very niche. I think it's easy to write them off if you've never run into a case where they'd really come in handy. Also, apologies for not walking through concrete examples in the original proposals. Because weak pointers (and their higher-level abstractions) are so niche, it's often a distraction to try to talk about specific examples in detail. But I'll admit that the broader internet is... frighteningly bad at providing concrete examples.

So, let me give it a shot with something I've worked on lately.

golang.org/x/exp/trace.Stack or internal/trace.Stack is an example of where a custom canonicalization map would be useful. Stacks found in execution traces are not currently canonicalized, but they could be, which would make the API easier to use. Canonicalized stacks are useful for building, for example, profiles out of execution traces, as go tool trace does. Making them comparable makes building maps with them much easier.

The thing is, stacks are usually best represented as slices of PCs. But slices cannot be used with unique because of issues with defining equality on slices in general, but with weak.Pointer, we can create a simple canonicalization map purpose-built for stacks.

As an example where binding lifetimes together is useful, we can take the previous example a step further. Every stack consists of a bunch of PCs which have some additional information associated with them. We want to keep that data around as long as that PC is still in use. One thing we can do is canonicalize each PC, turning it into a pointer:

var canonPCs map[uint64]weak.Pointer[uint64] // PC to canonical PC copy.

And then use this canonical pointer to look up the rest of the stack information. (And we get the added benefit of deduplicating it along the way.)

var pcData map[weak.Pointer[uint64]]StackFrame

(EDIT: Entries in the map would be cleared by calling runtime.AddCleanup on the canonical *uint64 values to remove them from the map. This also suggests this map should probably be a concurrent map of some kind, or at least have a lock around it.)

Why store it separately? Because hashing a StackFrame is pointless and wasteful when a PC is really sufficient for determining location identity. We can also reduce our storage footprint by not also storing the entire StackFrame as a key in the map.

Is this the best design for this? I don't know -- there's certainly a space of designs here and maybe one of them is far and away the best. But I think it illustrates that with weak pointers, we can explore the space of designs to achieve what we want.

Hopefully I didn't totally butcher this example and it makes sense to others as well.

Coming back to the general applicability of weak pointers, I think it's fair to say that weak pointers are used with maps most of the time. Seebs said as much in the OP to #43615. But, I think that providing the building blocks will capture far more use-cases for a lower maintenance and complexity burden for the Go project.

Today we have proposals for:

These are three powerful, orthogonal, and also surprisingly simple building blocks that can be used cover most of the "I need a weak something" design space.

And I do want to stress that these things are fairly simple in their own right, including weak.Pointer. We would not have a weak.Pointer proposal if it hadn't turned out to be much simpler than expected to implement with nice semantics. In particular, I think getting the equality/comparability semantics right is extremely important to be able to do this well.

(As an additional footnote, I don't foresee the semantics described in this proposal as being a problem for possible future GC implementations either. If anyone has insights on that though, it would be good to hear them now.)

To be clear, I do I think it is important to ask why not just try and provide higher-level abstractions as necessary. But I also want to point out that that's been the unstated status quo for years, and we haven't gotten very far with that. I think part of the problem may be that few of the higher-level abstractions really cover enough use-cases to be worth the time to design a really good API. We have sync.Pool, which is great and kinda like a weak cache under the hood, but not much else. I think we may have assumed for a long time that supporting weak pointers would be really complicated. But it turns out that they're not so bad, and paired with some other relatively simple APIs, they're useful when you need them.

Anyway, that's my take.

ianlancetaylor commented 1 week ago

var pcData map[weak.Pointer[uint64]]StackFrame

I'm not sure I understand this. The map will always refer to the weak.Pointer value, so it will never go away. We would need some other mechanism to remove those entries from the map. So why not just use map[uint64]StackFrame? Do I misunderstand?

mknyszek commented 1 week ago

We would need some other mechanism to remove those entries from the map.

That's correct. Sorry about that; I elided the fact that one would also need to use runtime.AddCleanup to remove entries from the map.

EDIT: Thanks. I updated my comment above to include that detail, and also that once we have runtime.AddCleanup we need some way to synchronize access to the map.

qiulaidongfeng commented 1 week ago

What is the relationship between maphash.Comparable and weak? See CL 609761, maphash.Comparable does not require a weak package.

bdbyrne commented 5 days ago

A couple clarifying questions.

Two Pointer values compare equal if the pointers that they were created from compare equal.

Is it possible for two pointers to compare equal if they were created from distinct objects? E.g. the first was reclaimed and its memory subsequently reused.

It returns nil if the value pointed to by the original pointer was reclaimed by the garbage collector.

Although the language is unambiguous, can it be strengthened to be "if and only if the original pointer was reclaimed"? That is, are we guaranteed that a weak pointer's Value() will never return nil as long as a normal reference is retained to the object?

Merovius commented 5 days ago

@bdbyrne

Is it possible for two pointers to compare equal if they were created from distinct objects? E.g. the first was reclaimed and its memory subsequently reused.

For an object to be reclaimed, there can't be any pointers to it anymore. So, no.

Although the language is unambiguous, can it be strengthened to be "if and only if the original pointer was reclaimed"?

"If and only if" is not true under the proposed API. It also returns nil if the pointer passed to Make is nil (last paragraph is the doc string of Make).

DeedleFake commented 5 days ago

For an object to be reclaimed, there can't be any pointers to it anymore. So, no.

I think the scenario in question is one where you have a weak pointer to a and all other references are gone so a gets collected. Then, later, b is allocated with the same address that a had previously. Will a new weak pointer to b compare equal to any leftover weak pointers to a?

Merovius commented 5 days ago

Ah, that makes sense. The doc comment says

Two Pointer values compare equal if the pointers that they were created from compare equal. This property is retained even after the object referenced by the pointer used to create a weak reference is reclaimed.

I would read that as "yes, in that scenario, the weak pointers are equal", which does seem strange. Especially if it would mean that (by a strict reading) you would now have two weak pointers comparing equal, with one returning nil from Value() and one returning a non-nil pointer. The only way I could reconcile that is by the runtime ruling out that an address is re-used as long as it is still stored in a weak pointer.

So, really "I don't know" and I'll step back from this question, sorry for the noise :)

mknyszek commented 5 days ago

Is it possible for two pointers to compare equal if they were created from distinct objects? E.g. the first was reclaimed and its memory subsequently reused.

No, that is not possible. Note that this can't be true for things like using weak pointers in maps to be useful.

I would read that as "yes, in that scenario, the weak pointers are equal", which does seem strange. Especially if it would mean that (by a strict reading) you would now have two weak pointers comparing equal, with one returning nil from Value() and one returning a non-nil pointer.

Perhaps I didn't explain this well in the original proposal, but weak pointers should be considered to be updated atomically, globally. It's not possible to have any skew, where one simultaneously returns nil and another does not.

The only way I could reconcile that is by the runtime ruling out that an address is re-used as long as it is still stored in a weak pointer.

This is kind of quarantining of memory not necessary, and it would be leaky. The implementation part of the proposal describes how this would work. In short, weak pointers point are themselves objects which are mapped 1:1 with some offset in some other object. The address of each "weak pointer object" at determines its identity.

bdbyrne commented 4 days ago

No, that is not possible. Note that this can't be true for things like using weak pointers in maps to be useful.

Great! Could this also be strengthened in the documentation, such as "Two Pointer values compare equal if and only if the pointers..."? It's not clear to intuit this property in the context of pointers to reclaimed objects.

Edit: removed the second portion, I believe I realized my mistake.

rsc commented 2 days ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

The details are in the top comment under the "Proposal" heading.