Closed mknyszek closed 6 months ago
Oooh, I like it.
I note that I've wanted both weak-keys and weak-value maps, to solve different problems. A weakly-keyed map[*T]metadata
would be used in cases where, if a given T still exists, I want to retain information related to it, but if the item ceases to exist, I don't need to keep the metadata anymore, and I don't want my map to keep the T existing.
You could sort of envision the symbol mapping working this way, except that it solves a slightly different problem. In particular, a key point of the interning is that if no one still has a given symbol, it Could Have Been Anything, so it's fine if the next time someone asks, they get a new one. With a hypothetical weakly-keyed-map approach, the existence of the value-the-symbol-denotes would potentially prevent this.
Not sure whether it makes more sense to try to view these as fundamentally different questions, or as inherently-related questions, because I think weak-reference maps very close to entirely solve the interning problem.
@seebs We went deep down the weak-keyed maps (specifically, ephemerons, which preserve the GC abstraction) rabbit hole as a solution for interning. I think we concluded that it's insufficient for interning, though they are useful for other purposes (as you note, map[*T]metadata
). I actually have a rough draft of a proposal for that somewhere, but we focused on this because it seems more immediately important. (EDIT: I have to look back in my notes and chat history to recall why weak-keyed maps were insufficient for interning.)
RE: Weak-value maps, I believe I came to the conclusion that those are basically weak references, and thus suffer from the same issues.
String interning with the proposed API would be equivalent to this right?
var b []byte = ...
s := intern.New(string(b)).Value()
where the string(b)
conversion does not allocate, but intern.New
may allocate if the string is not found.
IIUC, this doesn't solve the global type cache problem because you can't attach a value to the key.
Quite a few packages have a global map that caches a complex data structure for a Go type: https://github.com/golang/go/blob/e844d72421fb34b57eddf2653b33ed5ebf146b64/src/encoding/json/encode.go#L336
Today, there's a memory leak in "json" where dynamically constructed types with reflect.New
can never be reclaimed because json.encoderCache
is never cleared.
String interning with the proposed API would be equivalent to this right?
That's basically right, but in this design you're encouraged to keep the Symbol[string]
instead of immediately getting the string header back out. As soon as symbols disappear, the internal map entry for them becomes eligible for cleanup. Of course, it won't get cleaned up until at least the next GC cycle, so subsequent lookups will still probably be a hit in the map. But you'll likely get better behavior out of keeping the symbol. Plus, you get a cheap comparison.
you're encouraged to keep the
Symbol[string]
instead of immediately getting the string header back out
That seems a bit unfortunate. The "json" package when unmarshaling would be storing a string
into the user type. There's no place to hold onto a Symbol[string]
for long, storing it somewhere leads us back to the same problem this is trying to solve. That said, there is still benefit since we will share string values within a GC cycle.
EDIT: I have to look back in my notes and chat history to recall why weak-keyed maps were insufficient for interning.
I think the reason weak-keyed maps weren't a great solution because if you allow anything comparable except a pointer, you end up being able to break the GC abstraction. Heap pointers are special because they are an unforgeable identity. So type WeakMap[K any, PK interface{ *K }, V any] struct { ... }
is fine, but allowing all comparable types as keys is not because you can construct a value such that you observe when a map entry is reclaimed.
Notably, for the pointers-only case, it's also surprisingly easy to implement, because the Go runtime already has a way to associate arbitrary data with a pointer. But, you probably also want that weak map to be cyclic. That is WeakMap[*T, *T]
should still be allowed to collect the map entry where the same *T
exists as a value. That's more complicated but is certainly possible to do. It's just going to impact the mark path, so it needs to be designed carefully to be fast.
IIUC, this doesn't solve the global type cache problem because you can't attach a value to the key.
It doesn't. A WeakMap
would, but you'd have to unwrap the reflect.Type
into the underlying *internal/abi.Type
, which is at least fine for std
. This particular case might be common enough to warrant a TypeMap
or something.
That seems a bit unfortunate. The "json" package when unmarshaling would be storing a string into the user type. There's no place to hold onto a Symbol[string] for long, storing it somewhere leads us back to the same problem this is trying to solve. That said, there is still benefit since we will share string values within a GC cycle.
I believe the only thing that would fully satisfy this particular use-case is the "opaque string interning" alternative. It may be worth considering in the future as it's somewhat orthogonal to this.
I'm pessimistic that there's a single API that will solve all these problems, but I think this package is a decent starting point.
@mknyszek was it your intention to remove the proposal label, or was it a 'fat-finger' moment?
you're encouraged to keep the
Symbol[string]
instead of immediately getting the string header back outThat seems a bit unfortunate. The "json" package when unmarshaling would be storing a
string
into the user type. There's no place to hold onto aSymbol[string]
for long, storing it somewhere leads us back to the same problem this is trying to solve. That said, there is still benefit since we will share string values within a GC cycle.
FWIW there is also @josharian's string interning library, which works somewhat as you've described by keeping a sync.Pool
of map[string]string
around such that strings are mostly interned until the pool gets rid of its entries: https://pkg.go.dev/github.com/josharian/intern
It seems like it'd be possible to create a sync.Pool
of map[Symbol[string]]string
instead such that the Symbol[string]
s get kept around a little longer... but I guess maybe this isn't an improvement because someone's still going to have to hash the string and this just shifts the work elsewhere?
(I do agree that it's unfortunate that the Symbol
has to be kept around to ensure that it gets reused.)
(I do agree that it's unfortunate that the Symbol has to be kept around to ensure that it gets reused.)
I suppose we could make an exception for strings and have the Symbol
refer directly to the backing store for the string, though now we need to carry around a length in each Symbol
as well. (It would just be zero for all other types.) So, the cheap comparison becomes slightly less cheap. Maybe that's fine?
FWIW, I think strings are a reasonable exception given that they're the only type that is defined as a reference yet behaves like a value type.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
I suppose we could make an exception for strings and have the Symbol refer directly to the backing store for the string, though now we need to carry around a length in each Symbol as well. (It would just be zero for all other types.) So, the cheap comparison becomes slightly less cheap. Maybe that's fine?
It occurs to me that if we're going to force a copy and a new allocation for an insertion, then strings are entirely uniquely identifiable via their data pointer (regardless of length). The Value
method is slightly more complicated since we'd have to store the length next to the data, but it should actually be doable to properly intern strings and also enable using Symbol[string]
for cheap (single pointer) comparisons. The downside of this approach is it prevents optimizations like deduplicating strings in the map by prefix. But maybe that's fine.
It took me three-four reads to understand, but it sounds like a useful addition to the standard library.
What I'm wondering is if there's an automatic way in the profiling tooling to recommend "hey, you should intern this value" as part of the output. Otherwise I can't really see the usage of intern
becoming more ubiquitous, and then adding the value of adding it to the stdlib is diminished (integrating it more tightly to the runtime is not without it's costs!)
Hi all, I'm still learning about Golang (I'm the group manager for Java and now Golang @ MSFT), so take all of this with that in mind :-).
String Dedup - I wanted to add a quick note that in Java, String Deduplication is accessible to all of the GCs but it is still a runtime flag to enable. https://malloc.se/blog/zgc-jdk18 for the extra detail there. As you can imagine, since it's not on by default it's usage is pretty low, not sure 'we' got the user experience right there.
String intern perf - Although I know you have a different design in mind, I (or more likely one of our real engineers ;-)) will take a look at Java's intern performance and see if the concerns raised above still hold true or if that was patched or a new design put in place. All good things to learn either way I guess. I'll post back here if there's anything interesting.
public + internal? - Is there a use case where keeping an internal version of this for the runtime that can evolve more rapidly for internal only needs as well as an external public facing package make sense? I know if Java they started doing this a bunch with certain com.sun packages within the new module system.
weak references - Your note about using weak references resonates! Speaking from Java-land, its memory management challenges are usually confined to the realms of issues with strongly referenced memory leaks or whatever, but on the odd occasion when WeakReferences go wrong then its an order of magnitude harder to debug with tooling etc. That might just be more a reflection on their design or gaps in o11y tooling but for me its always a case of this is where some Dragons live.
'intern' may not be the best name for this package. It requires knowledge of other systems (particularly Lisp) that not everyone has. Also intern here is not the same meaning as intern string in Java. The package is about creating unique identities for values, so what about package unique? Then the bikeshed becomes what the name of the type is.
'Symbol' has the right meaning for people with a Lisp or compiler background, but not to others. Symbol has many other meanings in many other parts of CS.
unique.Handle seems like it strikes the right notes: the thing is a Handle, and it aligns with cgo.Handle too (although this one uses generics but cgo predated them).
package unique
// New returns a globally unique symbol for a value of type T. Symbols
// are equal if and only if the values used to produce them are equal.
func New[T comparable](value T) Handle[T] {
...
}
// Symbol is a globally unique identity for some value of type T.
//
// It is trivially and efficiently comparable with other Symbol[T] for the same T.
//
// If the value used to create the symbol is the same, the symbols are guaranteed
// to be equal.
type Handle[T comparable] struct {
value *T
}
// Value returns a shallow copy of the T value that produced the Symbol.
func (h Handle[T]) Value() T {
return *h.value
}
I don't feel strongly about the name. I also think both unique
and Handle
are reasonable. Happy to update the proposal if that's the consensus.
While we are talking names: I dislike New
. 1. New
generally is connoted with returning pointers, 2. unique.New
reads strangely and 3. it doesn't actually return a new Handle
/Symbol
, (hopefully) most of the time. I think Make
is slightly (though maybe imperceptibly) better. I can't really come up with a significantly better one.
[edit] Maybe I like unique.Intern(v)
🤔 [/edit]
Some JavaScript-inspired bikeshedding:
package symbol
func For[T comparable](v T) Of[T]
The type Of[T comparable]
is the part I'm the least enthused by. It's a little weird inside the package, but should hopefully read nicely when actually used, and it's pretty short, which is nice. There's obviously no JavaScript equivalent for that one, though.
If we want more raw name fodder, I believe that in Elixir (something like) this is called an atom.
It might be worth using the term "identity" somewhere here, since that's really what's being created. Putting that together with the verb "make" and the package name "unique", you get:
package unique
func Make[T comparable](value T) Identity[T]
This roughly corresponds to "make a unique identity for this value." In usage:
id := unique.Make(v)
🤷
To contribute something not naming-related as well: Do we have examples of use-cases for this besides strings?
The API constrains on comparable
. But, thinking about some comparable types, we have primitives like integer/floating point types, for which interning doesn't make sense. We have pointers (and channels), which already represent a unique identity (that is, two pointers compare as equal only if they are the same allocation - pointers are not compared by pointee). Functions, slices and maps are not comparable.
ISTM that what makes strings meaningful here, is that they are implemented as references - thus representing non-trivial/dynamic amounts of memory - but compared by the referred value. So the only other kind of type I can think of where this would be useful are linked lists based on interfaces, which is esoteric to say the least. And of course structs and arrays with string fields/elements. Do we have examples of that? And wouldn't they also replace the string-fields with unique.Handle
fields?
I'm asking because if we really ~only expect this to be used with strings, that would change the calculus of a string-only, transparent API for me, at least. And it might also suggest something about the naming.
Also, if it’s only strings, maybe it would belong in the strings package
My thought is that structs and arrays built using primitives (though not necessarily with string fields) are also useful to intern. One could imagine wanting a cheap comparison (and hash) of a [8]int
or something. However, I confess that I don't have a great example off the top of my head.
Focusing on strings is an option, but a solution that doesn't make interned values explicit probably means some kind of overhead when not interning at all. You need a bit of information somewhere indicating that the string has been interned so that the comparison slow path can be skipped and gain the full benefit. That bit could be propagated in the string header, but then that means a mask on access to the length or data pointer. We could also allocate an extra byte for each string to contain this information, but that's just memory overhead on all strings. (If someone has a clever trick up their sleeve that I haven't thought of, I'd love to hear it.)
Thus, my logic is:
And if some form of tuples is added, I could see direct struct comparisons becoming more common, too.
One possibility is to allocate string backing store in a separate region of memory. Then when comparing strings, if both string pointers are in that region, just do == of those pointers instead of comparing contents. One useful optimization is that we only need to check the containing memory region if the string size is large. We can, for example, check the contents unconditionally if the length is <= 16. Only for longer strings is it worth looking up the region of the backing store pointers. Probably "region" is just a special mark on the containing span?
Ah, good point. I didn't consider just having a differently-classified memory region.
The straightforward implementation of that has some negative effects like increasing the size of mcaches since we'd have to duplicate our size classes to get good memory overheads. We could have a new type of memory layout for these spans, but that's also gonna be somewhat complex (Immix style? Something else?). At least it's noscan so that limits the complexity a bit (no need to scan, but you still need to be able to mark and release the relevant memory). It's also a little unfortunate that we can't just have every interned string be a cheap pointer compare in this design. I do like that this doesn't have any overheads if you're not interning at all.
Or maybe...this should be a part of runtime
package?
String interning happens only during runtime, runtime.Intern(s string) string
(or similar) sounds very simple.
@cristaloleg The runtime
package is a bit of a kitchen sink of random things that don't all make sense together. I'd prefer not to keep filling it. 😅 If we go the string-only interning route, though, I would argue strongly for strings.Intern
.
FWIW, I still prefer something along the lines of the original proposal. Having an explicit "interned-ness" of a value is easier to reason about in terms of performance in my opinion. It's easier to tell at-a-glance whether a comparison will be cheap or not. Also, as mentioned above, we can still have full interning strings for strings without holding onto the Symbol[T]
by stuffing the string's data pointer directly into the Symbol[T]
value.
I agree with @mknyszek. My opinion is that string interning is subtle, whereas the proposed package (under whatever name) is a bit complicated but is clear. Also I find it easy to imagine wanting to intern struct values. I prefer the proposed package.
Is the lifetime of the deduplication tied to the lifetime of the Handle
, or the lifetime of the allocation? That is, would I have to keep a Handle
around to benefit from interning, or would it be enough to keep the interned string
value around?
Because if it is the former, it seems you have to expose it in your API and the prototypical use-cases of encoding/json
unmarshalling into a map[string]any
, or encoding/xml
filling up xml.Name
with the same namespace/tag names would benefit less. Or at least I don't think I understand the mechanic of how repeated Unmarshal
calls would share interned strings.
If it is the latter, I think I'm sufficiently satisfied that interning can be an implementation detail like sync.Pool
, instead of leaking into APIs like arenas would.
Regarding limiting the package to strings, that seems unnecessarily restrictive. @bradfitz points out that a program might be juggling many SHA256 hashes or some other comparable type, and deduplicating those down to handles can be a significant memory and CPU savings. It's not just strings.
We're avoiding atom and symbol because they are overloaded terms. Package unique seems to have good support.
The only question left seems to be the name of the constructor. It's true that New doesn't always create a new handle, nor does it return a pointer. (Also cgo.NewHandle does create a new handle on each call.)
Make is perhaps better because it does not imply a pointer is returned, although some people might still think it "makes" a new thing each time. Documentation will have to be the answer there.
package unique
// Make returns a globally unique symbol for a value of type T. Symbols
// are equal if and only if the values used to produce them are equal.
func Make[T comparable](value T) Handle[T] {
...
}
// Symbol is a globally unique identity for some value of type T.
//
// It is trivially and efficiently comparable with other Symbol[T] for the same T.
//
// If the value used to create the symbol is the same, the symbols are guaranteed
// to be equal.
type Handle[T comparable] struct {
value *T
}
// Value returns a shallow copy of the T value that produced the Symbol.
func (h Handle[T]) Value() T {
return *h.value
}
The only question left seems to be the name of the constructor.
LookUp? Fetch? Get? Value?
For?
Please pick a constructor name that allows adding more features to the package later without regretting earlier choices. For example I think we did better with that design element in math/rand/v2
by giving each source constructor a more specific name.
Also, what are the proposed package level docs? I'm curious how we describe this package and how well that description fits with the package name.
@Merovius
Is the lifetime of the deduplication tied to the lifetime of the
Handle
, or the lifetime of the allocation?
The former (at least in my understanding of the package). So, yes, if you want to guarantee deduplication, you have to expose Handle
in the API. That said, the most likely implementation is still a pool. If all the handles to a given object go away, and then before a GC runs you ask for another handle to an identical object, you'll get the one in the pool (and, of course, we won't necessarily discard all unreferenced objects in the pool at every GC any more than we do for sync.Pool
, I don't know what the right policy should be). I don't currently see a reason why we would expose Handle
in the API of packages like encoding/json or encoding/xml.
The former (at least in my understanding of the package).
We can actually have the latter behavior (i.e. tied to the lifetime of the allocation) for just strings. (This is not in the original proposal, but was mentioned earlier in the discussion.) Because strings are reference types that behave like value types, I don't think it would be unreasonable to stuff the string data pointer into Handle
. If we do that, then the interning really is bound to the lifetime of the string itself, and callers could just do:
s = unique.Make(s).Value()
as a replacement for the alternative:
s = strings.Intern(s)
As I mentioned above, there are some subtleties with reconstructing the string value, but there are also reasonable workarounds to that. This doesn't really make sense for other comparable types because they really are just plain values and thus a copy must be made out of the Handle
.
Though, maybe this will be a sufficiently surprising special case for strings that we shouldn't do this. I don't have a good grasp on what could go wrong.
As an aside, we could also extend the encoding
packages to understand unique.Handle
s. For example:
type myJSON struct {
Name unique.Handle[string] `json:"name"`
}
...
var msg myJSON
json.Unmarshal(&msg, data)
This would unmarshal the string and then call unique.Make
on it for you.
I think unique.Handle(s).Value
will be flagged in most code reviews and require discussion to understand. It seems better to have a complementary api implemented that way e.g unique.String
so that it’s clear to everyone this is a blessed use.
I think adding something best-effort for just strings (which could just be return unique.Handle(s).Value
) is reasonable, but it won't have the O(1) guaranteed comparison like a Handle. Hence, I think it makes sense to have both levels, especially if one can be trivially implemented in terms of the other.
It still seems like unique.Handle, Handle.Value, and unique.Make are the best names we have.
Have all remaining concerns about this proposal been addressed?
unique.MakeHandle
leaves room for other Make*
functions if needed later and, at least in my opinion, is clearer at the call site about what it makes.
How best effort would string -> string interning using unique.Make(s).Value()
be? For example, if the caller uniques s
in GC cycle 1, and s
survives into cycle 2, but no Handle for s
survives into cycle 2 (because we're immediately throwing away the Handle), and then in cycle 2, the caller uniques a string == to s
again, do they get s
or a new copy of s
? Either way, the strings would still be ==
, but if the point of doing this is to save space, it doesn't save as much as it could.
The weak map[string]*string
approach suggests you would get a new string in cycle 2. While the backing store of s
is still live in cycle 2, the weakness is attached to some copy of the string header, which will get thrown out with the Handle.
I feel like we shouldn't mix up this idea with string interning. Personally I think this idea is solid and useful by itself. String interning is really a different idea that solves a different problem.
I agree that we should not mix this up with string deduplication.
The weak map[any]*any must be weak on its values (meaning if the value is GC'ed, the map entry is deleted) and not weak on its keys, because in general the keys may not be GC-able at all. For example we've discussed using unique.Make on [32]byte values to reduce SHA256 hashes to unique Handles for easier comparison. For the entries to be cleaned up, they have to be cleaned once the *[32]bytes contained in handles are all GC'ed. It doesn't make sense to GC the key. Same for the less useful case of unique.Make on int values.
If the app keeps the Handles around, then that does have the effect of deduplicating memory, and that's fine. But if you want that, you need to keep the Handles, so s = unique.Make(s).Value() is in more of a no-op than a useful optimization.
All that said, we could also provide unique.String(s) returning string, for which the string itself is both the value and the "handle" for GC purposes. If we did this, that would be a separate value-weak map[string]string map, not a map[string]*string as in the general case. However, in that case the strings in the map really need to be separate, GC'able objects - in general strings might have been constructed using unsafe from mmap'ed memory or other sources. That somewhat implies that unique.String(s) needs to make a copy of s the first time it is called with a given s, which is fine in apps where there are many copies already and use of unique.String is letting us GC them. But it might still surprise people.
@ChrisHines, I see your comments about MakeHandle, but the intention of this package is that there is nothing more than unique.Handle here. unique.MakeHandle ends up redundant, and unique.Make is fine.
Even if we added unique.String later, it would still be very clear which was which, since the signature of Make is:
func Make[T comparable](val T) Handle[T]
@rsc What about the readability at call sites though?
I suspect that will usually look like this.
h := unique.Make(v)
Granted that is shorter than
h := unique.MakeHandle(v)
But the first feels like an incomplete sentence. What are we making?
With the builtin make
the type parameter fills that gap: s := make([]int)
, which reads as "make a slice of int". I don't think unique.Make
communicates enough to the reader.
What about unique.Obtain(v)
?
Also, docstrings in examples above still talk about symbols, should those be updated to handles as well? (E.g., "Make returns a globally unique symbol for a value of type T.")
But the first feels like an incomplete sentence. What are we making?
FWIW, I read unique.Make(v)
as "make v
unique". In that sense it's not an incomplete sentence, just a sentence with out-of-order grammar (from an English speaker's perspective). One could imagine Make
being called MakeUnique
but it seems redundant with the package name.
Also, docstrings in examples above still talk about symbols, should those be updated to handles as well? (E.g., "Make returns a globally unique symbol for a value of type T.")
Yes, thank you. Fixed.
Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group
The proposal details are as follows.
New package with this API:
// TODO
package unique
// Make returns a globally unique symbol for a value of type T. Symbols
// are equal if and only if the values used to produce them are equal.
func Make[T comparable](value T) Handle[T] {
...
}
// Symbol is a globally unique identity for some value of type T.
//
// It is trivially and efficiently comparable with other Symbol[T] for the same T.
//
// If the value used to create the symbol is the same, the symbols are guaranteed
// to be equal.
type Handle[T comparable] struct {
value *T
}
// Value returns a shallow copy of the T value that produced the Symbol.
func (h Handle[T]) Value() T {
return *h.value
}
If the value used to create the symbol is the same, the symbols are guaranteed to be equal.
Should this be iff?
Apologies for a late comment - I checked and didn't see this discussed above.
Do the handles need to be global? Most applications I know process the data in some context, so holding onto an object wouldn't be too hard. Without understanding the runtime - could splitting up apriori known disjoint
types categories into explicit (smaller) namespaces potentially help runtime?
A bit stretched argument could be that namespaces could be more forward compatible and allow for different configurations - for example, if my application already protects for concurrent access in some way, maybe I could configure some lighter concurrency checks. Or, to configure the aggressiveness of GC removing the elements (?) in the json example above. Or to explicitly clear a namespace (? not sure how useful practically this could be)
Or is the goal to allow for global object sharing from the start?
Edit: thinking of the netaddr example, today, it effectively lives in a namespace.
Proposal:
unique
packageUpdated: 10 April 2024
Oct 4th EDIT: Changed package name from "intern" to "unique" and renamed "Symbol" to "Handle" based on feedback. Apr 10th EDIT: Updated the proposal to describe the actual implementation. Notably, map entries can be dropped after just a single GC cycle instead of two, and there's no more object resurrection, which has several well-known problems.
CC @golang/runtime
Motivation
The lack of runtime support for interning in Go represents a gap with other languages. Demand in the Go community for a weak map and/or string interning has shown up in a few GitHub issues over the years. Although the section of the community requesting these features is quite small, we see the consequences of not having built-in support for this functionality in the form of the go4.org/intern package.
go4.org/intern is a package that implements a global intern pool. Its goals are two-fold: intern data to deduplicate copies, and provide fast comparisons for such interned data. Functionally, it returns a unique global identity for every value placed into it. That value is weakly held by the pool's internals, to avoid letting the pool accumulate in an unbounded manner. This implementation follows in the footsteps of Lisp: interning a value produces a "symbol" that may be used for a fast direct comparison and provides a method through which the interned value may be recovered.
Although it has only 4 direct importers, it is transitively used quite widely (estimated 0.1% of modules) thanks to use by inet.af/netaddr (reverse module deps), which uses it to deduplicate strings and reduce the cost of their comparisons. This has caused friction within the ecosystem because go4.org/intern makes assumptions about the implementation of Go. In particular, it imports the package go4.org/unsafe/assume-no-moving-gc to validate those assumptions. That package, by design, needs to be updated with every release of Go.
inet.af/netaddr's functionality has also recently been merged into the standard library as the
net/netip
package, bringing along with it a copy of go4.org/intern as theinternal/intern
package. Furthermore, go4.org/unsafe/assume-no-moving-gc now references an internal Go runtime variable to avoid having to update every release.Although the immediate issue has been mitigated, the gap in functionality remains, and we now have a copy of go4.org/intern in the standard library anyway.
Goals
The goal of this proposal is to tidy up
internal/intern's
interface with generics, integrate it more tightly into the runtime (for efficiency and future-proofing), and expose it as a standard library package.This proposal will also motivate why we should move forward with an API that's similar go4.org/intern and not some other direct or indirect ways of enabling efficient value interning.
Design
We more-or-less propose to move the go4.org/intern package API into the standard library, but with a few tweaks. The biggest difference is that go4.org/intern uses an interface to represent the "key" part of the mapping, with a special case for strings (
GetByString
) to avoid an additional allocation for the string header. Using generics, we can avoid these additional allocations while also improving type-safety and ergonomics.Here's a sketch of the revised API, which will live in a new package called "
unique
":The unique package must maintain an internal mapping of values to globally unique symbols, which risks growing without bound. However, once no copies of symbols produced by
Make
are reachable, the program loses access to that globally unique identity. The intern package is then free to produce a new one without the program noticing. If it can produce a new one, it can just as well delete the old one during that time period in which no copy of a symbol exists for a given value.In practice, integration with the garbage collector is necessary to achieve this, introducing the risk of leaking details about the garbage collector's implementation.
A key property to consider with functionality that relies on specific behavior from the garbage collector is whether it preserves the illusion that the executing program has infinite memory (with a GC there's a
malloc
but nofree
). The utility of such an illusion is that it makes programs significantly simpler to reason about. When programs are able to observe the fact that the garbage collector reclaimed memory, it is possible for programs to rely on non-deterministic properties of that reclamation. Finalizers are a good example of a feature that breaks the illusion and are difficult to use correctly.Luckily, it's not possible for programs to notice when
New
returns a differentHandle[T]
precisely because it can produce new symbols without the program noticing. Therefore, it's not possible for a program to observe when memory is reclaimed. (Someone could use the unsafe package to write down theHandle[T]'s
internal value as auintptr
, but that specifically requires the use of unsafe. One also can't do much with thatuintptr
; casting that back to a pointer violates theunsafe.Pointer
rules.)Implementation
The core data structure is approximately a
map[any]*T
, where the*T
is a weak reference. The runtime fully controls the*T
created here, so it would attach aspecial
that represents a handle to the value that can go nil once the object is reclaimed. Once the*T
is no longer referenced, the GC will clear the handle. Later, a background goroutine will clean up map entries with nil handles.Note that user code here is racing with the garbage collector. It's possible that in between the garbage collector noticing that there are no more references to the
*T
and the value being collected, the program may read the value out and produce a new strong pointer to the*T
. To alleviate this, the reader must synchronize with the garbage collector. To avoid issues with object resurrection and to allow for immediate reclamation of memory, the reader can simply ensure the span containing theT
is always swept before accessing the weak pointer handle. The only costs here are thus the highly-optimized span lookup and, once per GC, a single goroutine may need to sweep the span, which is generally quite fast. Note that if a value is being lookup frequently, then only the first lookup in each GC cycle will need to sweep; otherwise it'll return quickly.This is quite different to the implementation of go4.org/intern, and is only possible by accessing runtime internals. It allows us to reclaim memory in just a single GC cycle instead of the 3 that are currently required by the intern package.
Next is the map implementation itself. Because the map will be global, the underlying data structure will need to be thread-safe. Unfortunately there are far too many choices for a concurrent map data structure, so we need to cut them down by setting some goals and requirements. Here's a list:
A traditional bucketed hash map is not terribly difficult to make concurrently under these circumstances, but has the downside that incremental growth and shrinking of such a map is quite complicated. Trying to extend the existing
map
implementation with concurrency features is also likely to be complicated.Another option is to use something like an adaptive radix tree over 8-byte hash values. The growth and shrinking of such a tree comes naturally, and making them concurrent within the bounds of our requirements isn't too complicated. The downside is poor locality because of the tree structure. I suspect that at least to begin with, something like this will provide good enough performance.
For the actual implementation, we pick a simplified form of the adaptive radix tree specialized for
uintptr
values (hashes), essentially forming a hash-trie. It's straightforward to make reads out of this data structure perfectly scalable. Insertions and deletions will be performed using fine-grained locking techniques, reducing contention.As an additional note on performance, calls to
Make
should always avoid allocating in the case where the provided value is already in the map. Meanwhile they should explicitly clone the value when adding it to the map. (This means ifT
is a struct with pointers in it, the values those pointers point to are almost always going to be forced to escape to the heap, like with regular Go maps.)Risks
API design impact
The fact that interned values are represented by a type
Handle[T]
means that details around interning may encourage two versions of APIs, one supporting interned values and the other not. Contrast this with an "opaque" alternative (see "Opaque string interning" in the alternatives section) that makes no distinction between interned and non-interned values in the type system.I believe this is a legitimate concern, but suspect that it will mostly be mitigated in practice. Interning is a somewhat niche technique that helps performance dramatically in certain cases, but only in those certain cases that clearly benefit from data deduplication and/or fast comparisons of data. Elsewhere it's more clearly just cumbersome and simple microbenchmarks should reveal the slowdown.
Thus, I think "polluting" APIs in the cases where it's useful is likely worth the tradeoff, and it's sufficiently cumbersome to use that where it's not necessary it will simply be ignored. I believe the fact that go4.org/intern has relatively few direct importers supports this hypothesis.
Poor performance
One situation we absolutely want to avoid is performance issues like with Java's
String.intern
. My best understanding of the situation (as of when the linked blog post was written) is that:I believe we avoid all three of these issues in the implementation described above. The third one is less obvious if you're not familiar with the existing Go GC implementation. Briefly, the global intern map would only be a GC root insofar as any global variable is a GC root (i.e. the entire map would not be part of the root set, just the reference to it). And even if it was fully part of the root set, the Go GC already shards and scans the root set concurrently with the mutator executing. Hence, no pause-time impact.
Disclaimer: I don't know if this is still an issue in the Java world; I didn't look into this too deeply. If it's not, then that's great to hear. Nevertheless, it's still worthwhile to learn from past attempts at interning.
Alternatives considered
Opaque string interning
One alternative is interning just for strings, which is a common case. For instance:
Although such an API is temptingly simple, it's not really useful for any other kind of data structure. Only strings are properly immutable in the language.
Furthermore, because the interning is opaque, we don't get the full benefit of cheap comparisons out-of-the-box. The string comparison fast path would be taken more frequently, but when there's no pointer equality the string data would still need to be compared. To get the full benefit, there would need to be some additional runtime and/or compiler support to identify when two interned strings are being compared, which may end up slowing down non-interned string operations as well.
This is still worth considering for the future, but doesn't properly address the use-cases this proposal intends to address.
Background string deduplication
Another alternative is to just have the runtime deduplicate strings in the background. For instance, the JVM G1 garbage collector has a flag for such a feature (off by default). The advantage of this approach is the programmer has to set one flag and they get to save on memory costs.
However, like the previous alternative, this only really applies to strings again, because only strings are properly immutable at the language level. The other problem is that this feature would need to be opt-in, requiring a new top-level runtime knob. (Presumably this feature isn't on by default because it's not always worth the CPU cost in the garbage collector.) It's also substantially more complex to implement this in the Go garbage collector, because it doesn't currently know the type of anything in the heap.
Weak references
An even more general alternative to the proposed API is to just add support for weak references to the standard library and/or language. After all, the proposed implementation conceptually just uses a weak reference in its implementation anyway.
The main issue with weak references is that they're very hard to use in a system with tracing garbage collection, since they can turn out to be nil at very surprising times (or possibly never). Fundamentally, they break the aforementioned infinite memory illusion, because they reveal when memory is reclaimed.
The bar is extremely high for adding anything this difficult to use, and I believe we should prefer easier-to-use abstractions as much as possible.