golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.07k stars 17.68k forks source link

unique: new package with unique.Handle #62483

Closed mknyszek closed 6 months ago

mknyszek commented 1 year ago

Proposal: unique package

Updated: 10 April 2024

Oct 4th EDIT: Changed package name from "intern" to "unique" and renamed "Symbol" to "Handle" based on feedback. Apr 10th EDIT: Updated the proposal to describe the actual implementation. Notably, map entries can be dropped after just a single GC cycle instead of two, and there's no more object resurrection, which has several well-known problems.

CC @golang/runtime

Motivation

The lack of runtime support for interning in Go represents a gap with other languages. Demand in the Go community for a weak map and/or string interning has shown up in a few GitHub issues over the years. Although the section of the community requesting these features is quite small, we see the consequences of not having built-in support for this functionality in the form of the go4.org/intern package.

go4.org/intern is a package that implements a global intern pool. Its goals are two-fold: intern data to deduplicate copies, and provide fast comparisons for such interned data. Functionally, it returns a unique global identity for every value placed into it. That value is weakly held by the pool's internals, to avoid letting the pool accumulate in an unbounded manner. This implementation follows in the footsteps of Lisp: interning a value produces a "symbol" that may be used for a fast direct comparison and provides a method through which the interned value may be recovered.

Although it has only 4 direct importers, it is transitively used quite widely (estimated 0.1% of modules) thanks to use by inet.af/netaddr (reverse module deps), which uses it to deduplicate strings and reduce the cost of their comparisons. This has caused friction within the ecosystem because go4.org/intern makes assumptions about the implementation of Go. In particular, it imports the package go4.org/unsafe/assume-no-moving-gc to validate those assumptions. That package, by design, needs to be updated with every release of Go.

inet.af/netaddr's functionality has also recently been merged into the standard library as the net/netip package, bringing along with it a copy of go4.org/intern as the internal/intern package. Furthermore, go4.org/unsafe/assume-no-moving-gc now references an internal Go runtime variable to avoid having to update every release.

Although the immediate issue has been mitigated, the gap in functionality remains, and we now have a copy of go4.org/intern in the standard library anyway.

Goals

The goal of this proposal is to tidy up internal/intern's interface with generics, integrate it more tightly into the runtime (for efficiency and future-proofing), and expose it as a standard library package.

This proposal will also motivate why we should move forward with an API that's similar go4.org/intern and not some other direct or indirect ways of enabling efficient value interning.

Design

We more-or-less propose to move the go4.org/intern package API into the standard library, but with a few tweaks. The biggest difference is that go4.org/intern uses an interface to represent the "key" part of the mapping, with a special case for strings (GetByString) to avoid an additional allocation for the string header. Using generics, we can avoid these additional allocations while also improving type-safety and ergonomics.

Here's a sketch of the revised API, which will live in a new package called "unique":

package unique

// Make returns a globally unique handle for a value of type T. Handles
// are equal if and only if the values used to produce them are equal.
func Make[T comparable](value T) Handle[T] {
    ...
}

// Handle is a globally unique identity for some value of type T.
//
// It is trivially and efficiently comparable with other Handle[T] for the same T.
//
// If the value used to create the symbol is the same, the symbols are guaranteed
// to be equal.
type Handle[T comparable] struct {
    value *T
}

// Value returns a shallow copy of the T value that produced the Handle.
func (h Handle[T]) Value() T {
    return *h.value
}

The unique package must maintain an internal mapping of values to globally unique symbols, which risks growing without bound. However, once no copies of symbols produced by Make are reachable, the program loses access to that globally unique identity. The intern package is then free to produce a new one without the program noticing. If it can produce a new one, it can just as well delete the old one during that time period in which no copy of a symbol exists for a given value.

In practice, integration with the garbage collector is necessary to achieve this, introducing the risk of leaking details about the garbage collector's implementation.

A key property to consider with functionality that relies on specific behavior from the garbage collector is whether it preserves the illusion that the executing program has infinite memory (with a GC there's a malloc but no free). The utility of such an illusion is that it makes programs significantly simpler to reason about. When programs are able to observe the fact that the garbage collector reclaimed memory, it is possible for programs to rely on non-deterministic properties of that reclamation. Finalizers are a good example of a feature that breaks the illusion and are difficult to use correctly.

Luckily, it's not possible for programs to notice when New returns a different Handle[T] precisely because it can produce new symbols without the program noticing. Therefore, it's not possible for a program to observe when memory is reclaimed. (Someone could use the unsafe package to write down the Handle[T]'s internal value as a uintptr, but that specifically requires the use of unsafe. One also can't do much with that uintptr; casting that back to a pointer violates the unsafe.Pointer rules.)

Implementation

The core data structure is approximately a map[any]*T, where the *T is a weak reference. The runtime fully controls the *T created here, so it would attach a special that represents a handle to the value that can go nil once the object is reclaimed. Once the *T is no longer referenced, the GC will clear the handle. Later, a background goroutine will clean up map entries with nil handles.

Note that user code here is racing with the garbage collector. It's possible that in between the garbage collector noticing that there are no more references to the *T and the value being collected, the program may read the value out and produce a new strong pointer to the *T. To alleviate this, the reader must synchronize with the garbage collector. To avoid issues with object resurrection and to allow for immediate reclamation of memory, the reader can simply ensure the span containing the T is always swept before accessing the weak pointer handle. The only costs here are thus the highly-optimized span lookup and, once per GC, a single goroutine may need to sweep the span, which is generally quite fast. Note that if a value is being lookup frequently, then only the first lookup in each GC cycle will need to sweep; otherwise it'll return quickly.

This is quite different to the implementation of go4.org/intern, and is only possible by accessing runtime internals. It allows us to reclaim memory in just a single GC cycle instead of the 3 that are currently required by the intern package.

Next is the map implementation itself. Because the map will be global, the underlying data structure will need to be thread-safe. Unfortunately there are far too many choices for a concurrent map data structure, so we need to cut them down by setting some goals and requirements. Here's a list:

A traditional bucketed hash map is not terribly difficult to make concurrently under these circumstances, but has the downside that incremental growth and shrinking of such a map is quite complicated. Trying to extend the existing map implementation with concurrency features is also likely to be complicated.

Another option is to use something like an adaptive radix tree over 8-byte hash values. The growth and shrinking of such a tree comes naturally, and making them concurrent within the bounds of our requirements isn't too complicated. The downside is poor locality because of the tree structure. I suspect that at least to begin with, something like this will provide good enough performance.

For the actual implementation, we pick a simplified form of the adaptive radix tree specialized for uintptr values (hashes), essentially forming a hash-trie. It's straightforward to make reads out of this data structure perfectly scalable. Insertions and deletions will be performed using fine-grained locking techniques, reducing contention.

As an additional note on performance, calls to Make should always avoid allocating in the case where the provided value is already in the map. Meanwhile they should explicitly clone the value when adding it to the map. (This means if T is a struct with pointers in it, the values those pointers point to are almost always going to be forced to escape to the heap, like with regular Go maps.)

Risks

API design impact

The fact that interned values are represented by a type Handle[T] means that details around interning may encourage two versions of APIs, one supporting interned values and the other not. Contrast this with an "opaque" alternative (see "Opaque string interning" in the alternatives section) that makes no distinction between interned and non-interned values in the type system.

I believe this is a legitimate concern, but suspect that it will mostly be mitigated in practice. Interning is a somewhat niche technique that helps performance dramatically in certain cases, but only in those certain cases that clearly benefit from data deduplication and/or fast comparisons of data. Elsewhere it's more clearly just cumbersome and simple microbenchmarks should reveal the slowdown.

Thus, I think "polluting" APIs in the cases where it's useful is likely worth the tradeoff, and it's sufficiently cumbersome to use that where it's not necessary it will simply be ignored. I believe the fact that go4.org/intern has relatively few direct importers supports this hypothesis.

Poor performance

One situation we absolutely want to avoid is performance issues like with Java's String.intern. My best understanding of the situation (as of when the linked blog post was written) is that:

I believe we avoid all three of these issues in the implementation described above. The third one is less obvious if you're not familiar with the existing Go GC implementation. Briefly, the global intern map would only be a GC root insofar as any global variable is a GC root (i.e. the entire map would not be part of the root set, just the reference to it). And even if it was fully part of the root set, the Go GC already shards and scans the root set concurrently with the mutator executing. Hence, no pause-time impact.

Disclaimer: I don't know if this is still an issue in the Java world; I didn't look into this too deeply. If it's not, then that's great to hear. Nevertheless, it's still worthwhile to learn from past attempts at interning.

Alternatives considered

Opaque string interning

One alternative is interning just for strings, which is a common case. For instance:

package strings

// Intern takes a string and returns a string with equivalent contents,
// though possibly one that shares memory with other strings. This can
// be used to save memory in applications with many duplicate string
// values. If a string is successfully interned, a copy is always made.
func Intern(s string) string

Although such an API is temptingly simple, it's not really useful for any other kind of data structure. Only strings are properly immutable in the language.

Furthermore, because the interning is opaque, we don't get the full benefit of cheap comparisons out-of-the-box. The string comparison fast path would be taken more frequently, but when there's no pointer equality the string data would still need to be compared. To get the full benefit, there would need to be some additional runtime and/or compiler support to identify when two interned strings are being compared, which may end up slowing down non-interned string operations as well.

This is still worth considering for the future, but doesn't properly address the use-cases this proposal intends to address.

Background string deduplication

Another alternative is to just have the runtime deduplicate strings in the background. For instance, the JVM G1 garbage collector has a flag for such a feature (off by default). The advantage of this approach is the programmer has to set one flag and they get to save on memory costs.

However, like the previous alternative, this only really applies to strings again, because only strings are properly immutable at the language level. The other problem is that this feature would need to be opt-in, requiring a new top-level runtime knob. (Presumably this feature isn't on by default because it's not always worth the CPU cost in the garbage collector.) It's also substantially more complex to implement this in the Go garbage collector, because it doesn't currently know the type of anything in the heap.

Weak references

An even more general alternative to the proposed API is to just add support for weak references to the standard library and/or language. After all, the proposed implementation conceptually just uses a weak reference in its implementation anyway.

The main issue with weak references is that they're very hard to use in a system with tracing garbage collection, since they can turn out to be nil at very surprising times (or possibly never). Fundamentally, they break the aforementioned infinite memory illusion, because they reveal when memory is reclaimed.

The bar is extremely high for adding anything this difficult to use, and I believe we should prefer easier-to-use abstractions as much as possible.

mitar commented 1 year ago

(Potentially bikeshedding.)

What about unique.Obtain(v) instead of unique.Make(v)?

bradfitz commented 1 year ago

Obtain is a long way to spell Get :)

mitar commented 1 year ago

unique.Get sounds good, too.

cherrymui commented 1 year ago

The doc comments in https://github.com/golang/go/issues/62483#issuecomment-1782053117 still says Symbol. They probably should be changed to Handle?

mknyszek commented 1 year ago

Do the handles need to be global? Most applications I know process the data in some context so holding to on an object wouldn't be too hard. Without understanding the runtime - could users splitting up a-priori known disjoint types into explicit (smaller) namespaces potentially help runtime in some way?

They don't need to be, but the ability to explicitly create separate local namespaces creates the potential for hard-to-debug issues. I would not be surprised to see bugs filed where two equal values are placed into different namespaces, but end up getting compared and compare false as a result. To catch that at runtime (with say, a panic) would require also carrying around the namespace the handle came from, which would be substantially less efficient (2x larger memory footprint for handles, comparisons aren't quite as cheap anymore and need to be a method on the handle). With a single global registry, this can never happen. If the handles compare equal for some type, they are always going to be equal.

It also doesn't help the runtime much to make this part of the API. If we care about keeping separate types in separate internal maps for efficiency, that's an internal implementation detail and we could certainly do that.

rabbbit commented 1 year ago

Edit: to be clear I'm in favor of the proposal. If it was available I'd start using it tomorrow :) - even if I would slightly prefer to avoid using global state. I was asking to make sure all the noisy neighbors implications were considered - and it looks like they were.

Do the handles need to be global? Most applications I know process the data in some context so holding to on an object wouldn't be too hard. Without understanding the runtime - could users splitting up a-priori known disjoint types into explicit (smaller) namespaces potentially help runtime in some way?

They don't need to be, but the ability to explicitly create separate local namespaces creates the potential for hard-to-debug issues. I would not be surprised to see bugs filed where two equal values are placed into different namespaces, but end up getting compared and compare false as a result. To catch that at runtime (with say, a panic) would require also carrying around the namespace the handle came from, which would be substantially less efficient (2x larger memory footprint for handles, comparisons aren't quite as cheap anymore and need to be a method on the handle). With a single global registry, this can never happen. If the handles compare equal for some type, they are always going to be equal.

All true. Enforcing comparison via the namespace object could (?) slightly reduce the risk of this happening but not make the recovery behavior easier. Automatically namespacing per package could help too, but defeat some potential wins.

The global approach can have unexpected side effects though, right? Imagine an app that uses this package to dedup a small, practically frozen set of strings (countries? timezones?) from a single goroutine. I then import another component (or multiple ones) that also dedup strings, but something of a much higher cardinality and much more dynamic (hostports? email addresses? or even something silly like timestamps) - from multiple gorutines. The imports start affecting each other, even if the user knows that the categories of strings can never overlap.

It also doesn't help the runtime much to make this part of the API. If we care about keeping separate types in separate internal maps for efficiency, that's an internal implementation detail and we could certainly do that.

Would the solution be to type an alias for each of the categories? Like

type email string
var b []byte = ...
s := unique.Handle(email(b)).Value()

This seems feasible but also potentially easy to forget. (would this also not allocate?)

Merovius commented 1 year ago

@rabbit I'm really not sure what you think the negative side-effects would be. Note that the fundamental guarantee of the package - that a Handle[T] compares equal to another Handle[T] if and only if they represent the same T - remains true, no matter what. I fail to see how anything would break when using a Hande[T] as proposed, that wouldn't also break if those packages used a T directly. Like, in your example, if those packages used string for "countries? timezones?" and "hostports? email addresses?" (no snark intended, just trying to reflect the example) respectively, exactly the same things would compare as equal as if they used Handle[string]s.

On the contrary, semantically it seems extremely brittle to have them use different Handle[T]s, because now if they would move from string to Handle[string], things would change how they compare to each other. I actually think not having that happen and preserving the semantic guarantee that (unique.Make(a) == unique.Make(b)) == (a == b) is an extremely important property for correctness.

And yes, as you note, if you want to explicitly have different "namespaces", you can declare new defined types (nit: not "aliases"), which then compare as unequal (when stored in interfaces) even if they represent the same value. That is a) the same as it is with current types and b) already something we are somewhat used to, with context.Context keys.

For performance it might have subtle implications, but a) I find it very hard to make any kind of educated guess in what direction that performance difference would swing and b) if it does, we can always decide to use different stores for different types after the fact, without changing everything about the API - the package has access to T, as the types and functions are parametric.

Oh and one last thing: Yes, you are "relying on global state" but much in the same way that every Go program relies on global state - in the form of the GC heap and the schedulers Gs, Ms and Ps. It's a transparent part of the runtime.

rabbbit commented 1 year ago

@Merovius for negative side effects, I wasn't talking about comparing the handles, which I understand is cheap, but rather generating them. I understand some kind of global map would be needed (mentioned in the proposal, assumed to be read heavy) - thus one misbehaving package can others, right?

I didn't mean that I want separate namespaces, just that some applications might not need the global state here, (net/ip doesn't care about another package deduping email addresses). Relaxing this constraint might allow for increased isolation.

You don't see the global map as a concern - I understand this now. That's fine - I understand the tradeoffs @mknyszek listed. Perhaps I'm just being overly defensive.

rsc commented 1 year ago

understand some kind of global map would be needed (mentioned in the proposal, assumed to be read heavy) - thus one misbehaving package can others, right?

Not really - the semantics are such that no one can tell what anyone else is doing as far as handles, and yes one package could make the map big, but big maps are still fast.

dsnet commented 1 year ago

thus one misbehaving package can [affect] others, right?

I don't see how this is much different than the fact a misbehaving package today can OOM the entire Go process.

Also, it seems that the implementation of unique could shard the global map based on the Go type passed to Make, thus providing functional isolation if necessary. What's nice about this is that it's entirely an implementation detail that's not exposed in the API.

rsc commented 1 year ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

New package with this API:

// Package unique provides a way to construct globally unique identities
// for comparable values, to make storage and comparison of those 
// identities more efficient.
package unique

// Make returns a globally unique handle for a value of type T. Handles
// are equal if and only if the values used to produce them are equal.
func Make[T comparable](value T) Handle[T] {
    ...
}

// A Handle is a globally unique identity for some value of type T.
// It is trivially and efficiently comparable with other Handle[T] for the same T.
// Handles are equal if and only if the values used to produce them are equal.
type Handle[T comparable] struct {
    value *T
}

// Value returns a shallow copy of the T value that produced the Handle.
func (h Handle[T]) Value() T {
    return *h.value
}
rabbbit commented 1 year ago

Not trying to argue or to change anyone's mind, but just to close the discussion (thank you for your comments @rsc and @dsnet):

and yes one package could make the map big, but big maps are still fast.

Rather than size (large size implies it's a valid use-case for handles), I was more worried about handle generation. A package could ("incorrectly"), create A LOT of short-lived handles that get created and removed all the time.

Also, it seems that the implementation of unique could shard the global map based on the Go type passed to Make

I was talking about different disjoint "categories" of the same (Go) type. A string could be an email, IP address, or HTTP tag - but not simultaneously. The application knows this and can shard, thus reducing the need for runtime sharding. Like above, the primary worry here was part of an application that (unnecessarily) creates a lot of short-lived handles.

Like I said above, perhaps I'm being too pessimistic about the potential misuse of the package.

randall77 commented 1 year ago

Random tangentially-related thought ahead:

Can we put all the runtime._types in one of these weak maps? So when no more pointers to that runtime._type exist, the entry itself can be garbage collected.

This is an issue with any of the reflect type-building functions, e.g. reflect.StructOf. Once built, those types live forever. It would be great if we could GC those types from the "uniquification map" as soon as no one references them.

This is very close to the problem that prevents us from GCing plugin invocations. If we could determine that no types from a plugin are still accessible from anywhere, we could unload the plugin (and all of its code). A long outstanding issue #20461. (There are more difficulties than just this one, but it is a big one.)

Perhaps we could start with using this implementation for runtime._types (aka internal/abi.Types). It would require no new ABI so we could use it as a "first customer" experiment that involves no new API.

cherrymui commented 1 year ago

Can we put all the runtime._types in one of these weak maps? So when no more pointers to that runtime._type exist, the entry itself can be garbage collected.

Sounds like an interesting idea, probably worth trying.

One issue is that currently we don't always track runtime._type references (and assuming they always live), e.g. they may be referenced from non-heap memory, and in some places in the compiler we treat the first word of an interface header as non-pointer. Tracking them is probably possible but may incur some overhead.

hauntedness commented 9 months ago

looks this is a similar implementation to unique value + pool. so why not directly put it into package sync? Not a native english speaker,but to me Make is worse name to Intern. I believe sync.Unique is a nice name.

snglagolev commented 9 months ago

@rsc is the unique.Handle[T] a kind of weak reference or not?

Looks like Handle holds pointer to value and Handle.value cannot be collected by GC.

mknyszek commented 9 months ago

@hauntedness The proposal is accepted, so barring any serious concerns about the name, I doubt we'll be changing the name at this point.

@snglagolev unique.Handle[T] is not a weak reference. For the proposal to work, it must explicitly not be weak reference, since the whole point is to keep the canonical value being referenced live. The internal map for deduplicating values of type T will retain a pointer to the canonical value via something like a weak reference, but this is not exposed in the interface. (The deduplicating behavior and the fact that stale references get cleaned up in the map is guaranteed though, since that's necessary for the API to be useful.)

snglagolev commented 9 months ago

The internal map for deduplicating values of type T will retain a pointer to the canonical value via something like a weak reference, but this is not exposed in the interface.

something like a weak reference will not be accessible outside standard library?

mknyszek commented 9 months ago

Please read the proposal in the original post; weak references in the standard library and/or language are discussed briefly as an alternative.

snglagolev commented 9 months ago

I just want to understand...

h1 := unique.Make([1024]byte{})
h2 := unique.Make([1024]byte{})
v1 := h1.Value()
v2 := h2.Value()

In this case we have:

Is this expected behavior for unique?

Merovius commented 9 months ago

@snglagolev Yes, I definitely would expect that. Handle.Value returns a copy of the contained value. In this case, the contained value is a [1024]byte. So that gets copied into two new variables v1 and v2 and they obviously have different addresses. Note that for arrays, &x[0] is the same as &x. That is different from strings or slices, in which the value contains a reference.

It should still be noted that you end up with three copies - one in v1, one in v2 and one on the heap (that is in the uniqe storage). And in particular, if v1 and v2 don't escape, while you do incur a copy, you don't incur additional allocations - there will only be one copy on the heap, even though you obtain with h1 and h2 two handles from two arrays. And that h1 == h2, but they won't be as large as a [1024]byte, so comparing them is significantly cheaper.

gopherbot commented 7 months ago

Change https://go.dev/cl/573956 mentions this issue: internal/concurrent: add HashTrieMap

gopherbot commented 7 months ago

Change https://go.dev/cl/574355 mentions this issue: unique: add unique package and implement Make/Handle

gopherbot commented 7 months ago

Change https://go.dev/cl/576297 mentions this issue: internal/weak: add package implementing weak pointers

gopherbot commented 7 months ago

Change https://go.dev/cl/573955 mentions this issue: internal/abi: define EmptyInterface, TypeOf, and NoEscape

Rican7 commented 5 months ago

I know this has been closed and already accepted, and I'm not opposed to having small packages or anything, but I'm a bit surprised that this didn't end up in the cmp package. It seems like a natural place for it.

Seb-C commented 5 months ago

Apologies if my question does not make sense, but I am still a bit confused as to where benefits can be expected (or not) with this feature.

Would there be any runtime performance benefits in using a handle on a hardcoded string? For example:

var data = map[unique.Handle[string]]int{
    unique.Make("the_first_key"):  1,
    unique.Make("a_second_key"):   2,
    unique.Make("some_third_key"): 3,
}

//go:noinline
func getUsingUserInput(key string) int {
    return data[unique.Make(key)]
}

//go:noinline
func getSecond() int {
    return data[unique.Make("a_second_key")] // Hardcoded string
}

func main() {
    userInput := "some_third_key" // Assume it's not hardcoded
    fmt.Println(getUsingUserInput(userInput))
    fmt.Println(getSecond())
}

In the first case (getUsingUserInput), my understanding is that since the string has been created at runtime, and thus gets an arbitrary internal memory address, the Handle would have to search for this string into it's internal table anyway, so there should be no theoretical improvement (maybe even slower than a standard map[string]). Is that correct?

In the second (getSecond) case, where the string is hardcoded, what should I expect? Is there some compiler optimization that would pre-resolve the handle, giving me full performance benefits, or would the string be looked-up in it's internal table every time the function is called? Or maybe the string gets the same internal address for every call, and the Handle can somehow optimize this process by comparing the pointers?

I also realize that a hashmap is not the best example to illustrate this question, but as the goal is to understand the internals, I would have the same questions for any other kind of sorted map (red black tree for example...)

Merovius commented 5 months ago

@Seb-C In both of your cases, there would be an added cost with little benefit, yes. But the reason has little to do with whether or not the strings are constant and everything to do with the fact that you don't keep them around and only create the handle once and only do so for lookup.

The benefit comes if you don't pass string but Handle[string] around as values/fields in your code, because then you might save some allocations and if you do many comparisons, they'll be cheaper.

The prototypical example is if you have a parser and return tokens from that. If you use Handle[string] in the token instead of storing them as a plain string, you only need to keep the memory for each identifier or keyword around once, instead of allocating them for every token using them.

mknyszek commented 5 months ago

+1 to @Merovius' comment. For another example, see the net/netip package's implementation, which uses the unique package internally for IPv6 zone names.