Open danscales opened 2 years ago
Hi @zephyrtronium, the proposal includes:
Also, it is not intended that arenas be shared across goroutines. Each arena has a lock to protect against simultaneous allocations by multiple goroutines, but it would be very inefficient to actually use the same arena for multiple goroutines. Of course, that would rarely make sense anyway, since the lifetimes of objects allocated in different goroutines are likely to be quite different.
@zephyrtronium I think you missed my point as well :) The pointers returned by arenas as proposed might seem special. But they are not that special, really. There are plenty of examples in the Go language where functions return pointers with caveats on when and how they can be used, lest your program panics, be it that they can't be modified, the pointed to values can't be copied or that they can't be retained after a callback returns. All of these conditions are error-prone, but we live with them just fine.
They are also set apart from the other examples you mention (unsafe, cgo and syscall) in that they are still safe. Your program might panic, if you violate their invariants, but it won't have undefined effects.
It might be reasonable to make the difference in "memory safety" more concrete, now that we have generics. We could have arenas work with an
arena.Pointer[T]
type instead of plain pointers, with only aDeref() T
method (and maybe aHeapCopy()
*T). This makes it much clearer where arena pointers are used, and it makes documentation on the dangers more accessible.
IMO that is a bad idea. It would make it all but impossible to use them without polluting your API with arena usage. For example, the flagship use-case of protobuf encoding would now require all nested messages to be this new, special pointer type. As I said above, I don't think we should, generally, let arena usage cross API boundaries.
There are plenty of examples in the Go language where functions return pointers with caveats on when and how they can be used
For a specific example, consider io.Reader
and io.Writer
. The Read
and Write
methods are explicitly not allowed to retain the argument slice after the call has returned, because the caller may mutate it concurrently and any access would be racy.
All of these conditions are error-prone, but we live with them just fine.
That's not a reason to add even more edge cases to the language :).
Your program might panic, if you violate their invariants, but it won't have undefined effects.
That's technically true. But it makes reasoning about pointer lifetimes trickier. One of the appeals of GC'd languages if that you don't have to worry about lifetimes most of the time. This sounds like a decision that shouldn't be made lightly.
It would make it all but impossible to use them without polluting your API with arena usage.
That is not necessarily bad. Unsafe pointers are special, and they are marked as such. Arena pointers are special, and they should be marked as such if arenas make it into the language. But again, the proposal as it stands sounds like a heavy price to pay for the gains outlined.
One of the appeals of GC'd languages if that you don't have to worry about lifetimes most of the time.
I don't think that's actually true, especially given mutability and concurrency. Go programmers do have to worry about lifetimes today, because they have to worry about aliasing bugs, data races, and memory leaks (especially via goroutine leaks and global variables).
The major question for me is: how much additional worry about lifetimes would this proposal introduce? For types other than string
, I think the answer is almost none: a Go library that today is free of data races, aliasing bugs, and memory leaks, would still be free of those classes of bugs if passed arena-allocated arguments.
Should runtime (or compiler / go vet when possible to deduce statically) then just outright panic when using arena-allocated string
in map
s that are heap-allocated? If so it would solve the main issue I personally have with arena allocation being transparent for users.
There is still a question about e.g. how append
would work (even if arena usage is transparent) given that it reallocates underlying data every time the slice grows. There is a lot of code that doesn't expect append
without initial capacity to be very expensive. Not sure how big of a deal it is, but it might be in some cases.
I don't think that's actually true, especially given mutability and concurrency. Go programmers do have to worry about lifetimes today, because they have to worry about aliasing bugs, data races, and memory leaks (especially via goroutine leaks and global variables).
Fair point, mixing concurrency and mutability is one of the areas where you do have to worry about lifetimes. And memory leaks do happen in GC'd languages. But these are somewhat "special" scenarios: programmers have been admonished over and over not to mix mutable state and concurrency. Mutable global variables are a smell which most people know about. But making arena pointers just regular values adds a whole new area of uncertainty. Any pointer dereference could now theoretically panic.
[A] Go library that today is free of data races, aliasing bugs, and memory leaks, would still be free of those classes of bugs if passed arena-allocated arguments.
Yes, it will still be free of those classes of bugs, but prone to a whole new class: use after free on arena pointer that it doesn't know about.
Yes, it will still be free of those classes of bugs, but prone to a whole new class: use after free on arena pointer that it doesn't know about.
How so? As far as I can see, essentially every use-after-free bug that would occur with arenas has a corresponding (existing) aliasing bug or data race without arenas.
In both cases, the bug is caused by either unexpectedly accessing data retained after a call, or accessing data unexpectedly mutated during a call.
So, we already have this, sort of! It's called mmap
and using the mmapped space as backing store (usually via unsafe). And we do indeed have to do defensive copying of things accessed through such a thing, and it's a pain. That said, it can still be better than not having that.
I am having vague thoughts that this design might want to be changed in some way if we had immutability as a trait we could express in the type system.
@bcmills say you implement a global cache where you store pointers to arenas (nevermind if that's a good idea or not). Is it not possible to accidentaly pass an entry from the cache to a library function after the backing arena has been freed? And would that not result in a panic where it wouldn't have before? Or am I missing the point you are trying to make?
@muscar, a global cache of pointers to arenas has the same failure modes as a global cache of, say, heap-allocated []byte
.
If you do a map lookup like:
func lookup(b []byte) {
v, ok := m[string(b)]
…
}
then your program can panic or even crash if b
is mutated during the map access.
Similarly, if you have a function like this:
type S struct{ b []byte }
func deref(s *S, f func()) int {
if len(s.b) > 0 {
f()
return s.b[0]
}
return 0
}
then your program can panic if f
mutates the struct pointed to by s
.
A race on freeing an arena is conceptually not that different from any other data race.
A use-after-free bug with an arena is conceptually not that different from any other aliasing bug. (It has an especially deep connection to the aliasing bugs that arise with sync.Pool
today!)
Thanks for the examples @bcmills. I see what you mean, and I agree with the point you made. My example was indeed an instance of an aliasing bug.
One of the things that makes me uneasy about this proposal is the potential for this kind of problems to become more pervasive. The examples you provided are well chosen, and they illustrate the current problems well.
Both of them involve []byte
which is different from a pointer, at least from the type system's perspective. A Go programmer who sees []byte
will hopefully be more vigilant. Yes, it's a somewhat weak argument. But at least the language gives you a sort of heads up that things could get tricky.
But making arena pointers indistinguishable from GC'd pointers could get hairy because now you have to be extra careful when you see pointers because you don't know where they come from. That's why marking them, e.g. like C#'s Memory<T>
type, has some benefits. In addition to it being an explicit marker that the programmer is dealing with non GC'd memory, it also collaborates with the GC, e.g. by Pin()
-ing objects.
But making arena pointers indistinguishable from GC'd pointers could get hairy because now you have to be extra careful when you see pointers because you don't know where they come from. That's why marking them, e.g. like C#'s Memory
type, has some benefits. In addition to it being an explicit marker that the programmer is dealing with non GC'd memory, it also collaborates with the GC, e.g. by Pin()-ing objects.
I believe that these concerns are unnecessary. One of the key benefits of the arena pattern is how easy it is to avoid the sorts of problems you're describing. Imagine you are the engineer writing the code that creates & frees the arena- what sorts of mistakes would you have to make for another engineer's actions to cause freed memory to be dereferenced? Given that you are in a position to know what data is being allocated from the arena, and also to control the flow of that data, how would such a thing even be possible if you act correctly?
The problem, then, is that the engineer who creates the arena must be in the driver's seat, and that does require arenas to be accepted from callers in most situations. As Merovius observed above- the json/encoding library can't use arenas "under the hood" because the caller has to be the one who creates & frees the arena so that there's no "magic" silently producing transient pointers. (That said, @Merovius I think you're right that you can just make the arena an optional property of the decoder and not even have to pass the arena in for any particular method.)
@CannibalVox
Imagine you are the engineer writing the code that creates & frees the arena- what sorts of mistakes would you have to make for another engineer's actions to cause freed memory to be dereferenced?
This exercise in itself is interesting. The burden of this exercise will now lie on the shoulders of implementers who use arenas to some extent: "How can the users of my library misues it? What do I have to warn them about? What do I have to document?" And this last point, relying on docs instead of automated checks is key. Computers are good at following rules. Humans not so much. How did documenting ownership and lifetime details work for C or C++?
Given that you are in a position to know what data is being allocated from the arena, and also to control the flow of that data
Am I? If so then that somewhat circumscribes and restricts what can be done with arenas. If you want to use them as an internal implementation detail of a serialisation library or something similar then fine. You have control over a lot of it, and you don't have to even expose that you're using arenas to your library's clients. But that begs the question if such a change is worth the runtime and language support.
how would such a thing even be possible if you act correctly?
That's a pretty big assumption to make, and I'm happy for whatever help compilers can give me to help me act correctly.
The problem, then, is that the engineer who creates the arena must be in the driver's seat, and that does require arenas to be accepted from callers in most situations. As Merovius observed above- the json/encoding library can't use arenas "under the hood" because the caller has to be the one who creates & frees the arena so that there's no "magic" silently producing transient pointers.
Cool. So we come back to the point of arenas being used in a very specific case, as a performance optimisation. If the measurements support it then by all means go for an arena. But again, the changes it would require from the runtime, and the language would open the door to some nastier issues in my view.
This exercise in itself is interesting. The burden of this exercise will now lie on the shoulders of implementers who use arenas to some extent: "How can the users of my library misues it? What do I have to warn them about? What do I have to document?" And this last point, relying on docs instead of automated checks is key. Computers are good at following rules. Humans not so much. How did documenting ownership and lifetime details work for C or C++?
Arenas were developed as a pattern in C & C++ to deal with the exact problem you're describing because they don't have those issues. The correct usage of a memory arenas is ultimately that it's allocated at the start of a method, it's always freed at the end of that same method, and all allocations made between those two points happen in the arena. Arenas are not intended for long-lived objects, or really any objects that escape that method at all.
If so then that somewhat circumscribes and restricts what can be done with arenas.
Yes, absolutely! Arenas are an allocation pattern intended for a strict subset of use cases that, in C, dramatically improves allocation performance & reduces bookkeeping in those use cases. It does increase bookkeeping for us, because we normally use garbage collection, but it also improves performance by even more than usual. This is not intended, I don't think, for every use case. There's a specific hole in go's memory performance story, and it's not exactly arena-shaped, but arenas do fill most of the space, I think.
I don't see/ctrl+f any proposed behavior for the zero Arena value. It seems to me that having a nil Arena always allocate from the heap would make opt-in usage very easy.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group
@CannibalVox I know what arenas are. And how they are used in C or C++. What I was getting at (in a very roundabout way) is that they are a very specific pattern only useful in some scenarios. We seem to agree on that point :).
The point I was trying to make is that going straight to adding language level support for a very specific pattern seems a bit rushed. Once you add something to a language it's hard to go back. And you have to write docs for it, make sure newcomers to the language don't misuse it, etc.
That's why a good strategy is to try to implement it as a library. You can always add features to the language later, when you understand the use cases better, if it's warranted.
Also, extending the language in the proposed way would only benefit this one specific case. If you really need language support it makes sense to add smaller, more generic features that other users could use, independently of the one specific use case that prompted them. In this case you could add raw memory handles that the language and GC know about. Allow the memory region to be pinned, and let the GC know that. This would serve as a base for the arena library as well as other scenarios.
What is the cost benefit analysis on making this change to the language? You get a 15%-20% speedup on some very specific scenarios on x86-64 linux at the cost of added complexity for every user of the language irrespective of them using the feature or not.
Hi @danscales
If we have a package that is allocating lots of memory (often by doing decoding of wire format) and we want to use arenas with it, then we will have to plumb in a way to send down an optional allocator function that it should use if enabled (in our case, the allocator function would use an arena).
Is there a rough estimate of how many new APIs this might translate to in the standard library, for example?
No, there is no estimate, since this is just a proposal and its possible use would be completely optional for any packages, of course. I expect there would be lots of consideration and experimentation before deciding that it may be worth adding optional support for arenas in a particular package. (Maybe a little similar to being very cautious and slowly gaining experience before adding adding any generic APIs to existing packages, now that generics is just arriving)
I’m generally in fair of this but have two concerns.
“If this optimization is used, the application is allowed to proceed normally if an object is accessed after the arena containing it is freed, as long as the memory of the object is still available and correct“ I feel like this will only make things worse by hiding genuine errors. Why do we need this? I feel like much better is to immediately panic after free is called regardless of whether arena is really freed or not.
Second is the fact that this library would return regular Go objects. Go is very much about reducing cognitive load and as others already pointed out arena allocations will force developers to always keep in mind that some memory might be arena allocated and they can’t simply store it for later use. Given Go lacks immutability developers already has to think about that all the time and hope documentation covers that. We don’t need to add more cognitive load.
Given that arena allocation is very niche usecase why don’t we also make it very explicit. Let API return special object that returns empty interface and has to be type asserted every time you need to access the underlying object. At least that way we are very explicit about arena allocation and developers will think twice when trying to store underlying object somewhere else.
The point I was trying to make is that going straight to adding language level support for a very specific pattern seems a bit rushed.
That is not the plan. The plan is to add a new package. That's a library-change, not a language change.
What is the cost benefit analysis on making this change to the language? You get a 15%-20% speedup on some very specific scenarios on x86-64 linux
I don't think the scenarios are that specific. Approximately every request-based server could benefit from it without too much intrusiveness. i.e. ~every HTTP or (g)RPC server. That's a large slice of the Go ecosystem.
They are, however, specific enough to IMO justify that the problems aren't that likely. It is relatively easy to document "the Handler is not allowed to retain the request message" and keeping to that - and detecting if you don't via tests.
Yes, that is true. It is possible there is another implementation that could deal with that many arenas simultaneously. But I will note that if you have 100s of requests in flight at one time, then they may not be allocating that much new data while serving each individual request, so arenas may not really be required in that case. Arenas are really more useful/appropriate for unmarshalling large data objects for which you may do a lot of processing. sync.Pool may be more appropriate if the allocated data per request is small (since there may also be only a limited number of known object types).
What message sizes/throughputs was the prototype tested at? Many small requests/responses is certainly the norm in my background, but I know google was looking hard (I believe?) a Vitess blog post that mainly dealt with very large data replication messages. Certainly if proto performance gets worse than status quo at high throughput levels, many people wouldn't be thrilled with that.
As mentioned, the performance gains are expected for dealing with large messages that are being unmarshaled or building large data structures during processing that can all be freed at once. Even if a package (such as protobuf) adds support for an arena allocator, its use is entirely optional. For a program that is dealing with all small requests and message sizes, the program would just not specify an arena allocator when unmarshaling, so the normal heap allocation be used, so there would be no reduction in performance. Arenas would only be used after experimentation to determine if they are noticeably improving performance.
@creker
as others already pointed out arena allocations will force developers to always keep in mind that some memory might be arena allocated and they can’t simply store it for later use.
As others have pointed out as well, you already have to keep that in mind. It is very common to have requirements like this. It is simply not true that you can assume that any pointer is safe to store in a global data-structure or otherwise to retain (which is the main concern).
This discussion does make me consider one serious concern: strings.
In the above proposal, downstream code that is not aware of the arena (or may have been written before the existence of arenas!) seems to need to be aware that arenas may exist in order to avoid doing completely normal things with arena-allocated strings.
All of this goes back to my understanding of how arenas should work, which is that the user who allocates the arena is responsible for controlling what downstreams they send them to. The string thing seems like such a huge surprise though that I don't think you'll be able to adequately prepare anybody for that behavior even if the arena author is aware of their repsonsibilities.
It may be that there does need to be some language-level support. An ArenaPointer[T]
type that the compiler considers interchangeable with T- that is, you can call all the same methods and access all the same fields as T, but there is a type-safe check that everyone understands what is going on, and an ArenaString
that is unambiguously different. Methods to convert a pointer to a heap copy and an unsafe
method that will give you the underlying T/string. That seems like a howitzer compared to the proposal, but the string thing really does spook me. That seems like it is going to cause unexpected problems almost constantly.
[I]f you have 100s of requests in flight at one time, then they may not be allocating that much new data while serving each individual request, so arenas may not really be required in that case. Arenas are really more useful/appropriate for unmarshalling large data objects for which you may do a lot of processing.
This seems to restrict the subset of apps that would benefit from this incarnation of arenas somewhat significantly. The added focus on protobufs makes me think that the authors have a very specific set of applications in mind.
Is such an intrusive change worth the 15% speed gain for a subset of apps of unknown size?
It's definitely reasonable to question how useful arenas will be vs how "intrusive". Their use is entirely optional, and the only user-visible change would likely be the appearance of a new arena package with its API (no language change). So, for most users they shouldn't be "intrusive". But it is good to discuss if arenas puts a burden on package/library writers, and/or if there are rules about usage of arenas that can minimize that burden.
What is the cost benefit analysis on making this change to the language? You get a 15%-20% speedup on some very specific scenarios on x86-64 linux at the cost of added complexity for every user of the language irrespective of them using the feature or not.
My argument is that a (properly implemented) arena only increases complexity for those who choose to use it. There's a lot of stdlib features I've never personally interacted with that exist for situational use. I think that there's a strong argument that arenas CANNOT be satisfactorily implemented outside golang/go and it's very much a "put it here or not at all" thing.
@Merovius it’s true that you can’t assume any pointer is safe to store but it is in most cases unless documented otherwise which is quiet rare in my practice. Arena essentially puts a huge burden on library writers. Either they have to painstakingly document every function or make sure any public API returns non arena allocated copies. Without arena allocation that’s relatively easy - you just forget the object and let it go. With arena you have to carefully examine whole execution path to find where the object come from and whether it’s arena allocated. Yes, you can imagine using special naming scheme for arena allocated variables or cases where even letting go of some object is dangerous because some internal structure might still retain it. But it still feels like arena allocation makes it even more complex and dangerous for no good reason. And at least with wrappers there’s some additional barrier. Either that or get rid of Free and let GC worry about when arena is safe to deallocated. But then the question is, will it be performant enough to warrant the complexity.
and/or if there are rules about usage of arenas that can minimize that burden.
I'm wondering if there's an opportunity to provide a (separately packaged?) complementary API that nudges towards the less pathological patterns of concurrency using arenas. Is this something where 80% or 90% of the use cases are going to be solving the same problems?
Either they have to painstakingly document every function or make sure any public API returns non arena allocated copies.
I would describe this as the correct use of arenas to begin with. Arenas should not last beyond the stack they're allocated in, and stuff allocated via the arena should not escape the call graph beneath that function.
Maybe that's the answer? Require arena allocation on the stack, if an arena pointer escapes a method where the arena that allocated it is on the stack, move it to the heap, ban assigning arena pointers to anything on the heap (or auto-heapify it if you try to assign it to something on the heap)
Would this provide as significant an improvement with a generational GC? I'm sure it would still be faster, but would it still be enough to be worth it?
@CannibalVox yes but the point I’m getting at, who’s gonna enforce it? Or is it another case of “assume developers understand all the caveats”. There’s already too many cases of that in Go and this adds another. That’s why I feel like there needs to be either some barrier in convenience or make it smarter so that incorrect usage is simply impossible. Automatically making objects escape sounds like a great idea.
@CannibalVox yes but the point I’m getting at, who’s gonna enforce it? Or is it another case of “assume developers understand all the caveats”. There’s already too many cases of that in Go and this adds another. That’s why I feel like there needs to be either some barrier in convenience or make it smarter so that incorrect usage is simply impossible. Automatically making objects escape sounds like a great idea.
I would like the auto-escape, I feel like the compiler already understands the situations where an arena object need to evaluate whether it needs to move to the heap, so hopefully that's possible.
However, I don't think "don't return objects that you, personally, have just freed on the previous line of code" is that tough of a sell. I'm definitely more concerned with situations where an engineer did not expect arena pointers to be persisted, like basically anything involving strings.
@creker I get the impression you are overestimating how common usages of arenas are going to be. There are a handful of use-cases for them and you are only affected by them, as a library author, if you decide to do so. And even then, their use will be limited to a couple of places.
I think it's instructive to look at a concrete use-case: An RPC server. In that case, the documentation will be "a handler might not retain a pointer to the request message". And the pointer to keep track of (for the framework author) will be that single request message. That's not a huge burden.
Apart from that, yes, as a library user you need to read documentation that says "don't retain such and such pointers", but again, it's fairly common that you have to do that anyways.
FWIW I've been trying to come up with places where I might use arenas. I can't think of any. And that's not for lack of worry about doing lots of small allocations - it's that arenas are just not the right tool for the jobs I have that problem with.
@Merovius that’s only if we assume that developers themselves understand that they should almost never use arenas. But if we look at sync.Pool which is everywhere I don’t think that would hold true. Especially if people are coming from other languages where arenas are very popular like C++. Eventually arenas would get into many packages regardless of actual need for it and Go ecosystem would suffer if arenas are as fragile as proposed. The notion that allocations are a bane of existence of every Go developer is very much popular.
seems like a hassle with little benefit. where is the simplicity? the line between spec and implementation is blurring. Where is the CL?
@creker Your concern was "it places a huge burden on library authors". ISTM that you are now saying that burden is mostly self-inflicted by people using it where they shouldn't. So, maybe that burden is a bug, not a feature of the design.
FWIW that there are a couple APIs that interact with the GC, which are similarly dangerous. runtime.SetFinalizer
, for example. Or the accepted proposal for pointer-pinning. And sync.Map
also tends to be abused by people (but TBF is rarely ever harmful). I don't think we should necessarily let us deter from adding useful APIs, to protect people from themselves.
@danscales In case you missed it, I would like to push my earlier question: Why does the design rely on the intent for users to call Free
? I think if we can rely on finalizers, a lot of the pushback in this issue would be resolved, as most usages can be made fully transparent. That also would make them more generally useful, as the need to call Free
makes them prohibitive for many applications.
@danscales In case you missed it, I would like to push my earlier question: Why does the design rely on the intent for users to call
Free
? I think if we can rely on finalizers, a lot of the pushback in this issue would be resolved, as most usages can be made fully transparent. That also would make them more generally useful, as the need to callFree
makes them prohibitive for many applications.
The design does include a finalizer for arenas, but I don't see how that changes anything: there are plenty of situations where the arena can be GC'd but its allocations are glued to something.
@Merovius See the section in the proposal "Removing arena free" in the "Rationale" section. We find that arenas do not give any improvement in performance on (and can make things worse) if arenas don't have an explicit arena.Free
and instead are only freed when the garbage collectors notices that there are no more pointers to the arena or the objects in the arena. One of the big wins of arenas (where they can be applied) is that the memory can be almost immediately reused at the point of the arena free, rather than waiting for a complete GC cycle. And, if most memory is allocated and freed via arenas, you may actually decrease how often the garbage collector needs to be called.
The conversation seems to be going back and forth between:
Maybe there's a compromise here where the API is designed in such a way as to indicate that it's a temporary workspace. Some languages have a with
or with-x-y-z
pattern that takes a block, or lambda and cleans up the managed thing afterwards (transaction, file, etc):
var (
result Something
err error
)
arena.With(func(a arena.Page) {
var ptrT *T
a.New(&ptrT)
... go about your business
})
Yes, it allocates a closure. But, we're already used to the semantics of calling functions and having the stack space be cleaned up. That's essentially what we're trying to achieve with arenas. Perhaps unfortunately, the fact that Go does escape analysis for stack allocations won't work the same for arena allocations, and there will certainly be bugs associated with it. But maybe that's lintable?
The conversation seems to be going back and forth between:
- "Arenas aren't generally useful and puts too much burden on the programmer to ensure no heap objects point into an arena allocation."
- "Arenas are really useful for specific use cases, and in those use cases there's no need to have heap objects point into an arena allocation, therefore there's no problem."
Maybe there's a compromise here where the API is designed in such a way as to indicate that it's a temporary workspace. Some languages have a
with
orwith-x-y-z
pattern that takes a block, or lambda and cleans up the managed thing afterwards (transaction, file, etc):var ( result Something err error ) arena.With(func(a arena.Page) { var ptrT *T a.New(&ptrT) ... go about your business })
Yes, it allocates a closure. But, we're already used to the semantics of calling functions and having the stack space be cleaned up. That's essentially what we're trying to achieve with arenas. Perhaps unfortunately, the fact that Go does escape analysis for stack allocations won't work the same for arena allocations, and there will certainly be bugs associated with it. But maybe that's lintable?
Two things:
Just to clarify, @bcmills said
Arena-allocated string values, which are functionally more like slices: they are no longer safe to read after a call returns, and not guaranteed to have stable contents over time.
They are guaranteed to have stable contents over time. They either have their originally allocated contents, or they crash the program when accessed. In no situation would they successfully look like a different string than their original contents.
Similarly, arenas are better than the sync.Pool
allocation scheme that some have proposed as an alternative. If you use a sync.Pool
to do allocation, you run the risk of use-after-free bugs ("free" = return to pool) that are undetectable. With arenas, a use-after-free either still gives you the correct contents, or crashes the program. In no case will you mistakenly read (or be otherwise aliased to) the contents of some unrelated allocation as you would with a sync.Pool
implementation.
On Feb 23, 2022, at 16:59, Stephen Baynham @.***> wrote: Do we actually, really, truly, foresee engineers not being able to wrap their minds around what an arena is and what it does? I understand that you personally are not taking a position on this question in your comment here, but a lot of the discussion around the difficulty of using arenas seems to focus on the idea that somebody might just not really understand that when they call arena.Free(), the memory they have allocated by the arena... will be freed. That seems completely incredible to me. Mind boggling, I know!
This idea does not actually prevent anyone from escaping arena-allocated data, nor does it make it especially harder to do so by accident. I assume that in any case where there was a problem, there's still a
Yup! Fully acknowledged that there are problems with this idea. The obvious advantage is the isomorphism with stack allocation. But, with this being a library, and not, say, a new type of language level construct, is there actually a way to prevent these types of problems? I think a defensive API and clear documentation are the best bets.
On Feb 23, 2022, at 17:05, Keith Randall @.***> wrote:
With arenas, a use-after-free either still gives you the correct contents, or crashes the program.
Arguably, Free
should zero the underlying memory to ensure a crash as early as possible…
Arguably,
Free
should zero the underlying memory to ensure a crash as early as possible…
The proposal uses unmapping as the mechanism to ensure crashes. Because unmapping is somewhat expensive, the proposal batches them up to lower overhead, but that of course leaves a window where an object is freed but still accessible. Perhaps there should be a debug mode where unmapping is immediate. (Maybe when running with -race
? That's kind of the grab-bag that we throw expensive debugging things into, but it might be appropriate here.)
I don't think just zeroing is a good idea. It may cause quick crashes on pointer data, but may just cause incorrect computation (especially on nonpointer data, but potentially on pointer data as well).
@CannibalVox
... a lot of the discussion around the difficulty of using arenas seems to focus on the idea that somebody might just not really understand that when they call arena.Free(), the memory they have allocated by the arena... will be freed. That seems completely incredible to me.
Speaking as someone who spends a fair amount of time teaching new Go programmers, I don't think that's particularly incredible at all. Ideally, nothing in Go makes you think about memory being freed. That's a C concept. In Go, Java, Python, &c. it only applies in edge cases, and then only as an implementation detail. Hence, my reservations about this proposal come from the pedagogical perspective.
[Arena-allocated strings] are guaranteed to have stable contents over time. They either have their originally allocated contents, or they crash the program when accessed. In no situation would they successfully look like a different string than their original contents.
That's better than it could be, but I still wouldn't describe a thing that decays into a poison-pill that crashes your program as “stable contents”. 😅
Also, it is not intended that arenas be shared across goroutines. Each arena has a lock to protect against simultaneous allocations by multiple goroutines, but it would be very inefficient to actually use the same arena for multiple goroutines. Of course, that would rarely make sense anyway, since the lifetimes of objects allocated in different goroutines are likely to be quite different.
We can just add heap-less functions that allocate all objects on stacks and panic on heap-escaping since arenas are bounded to goroutines.
@danscales Darn, don't know how I overlooked that. That makes me feel a lot worse about this proposal, as it really requires exposing arenas in the API to be useful. I feel like performance optimizations like this should be implementation details of packages. But okay, that's that, then.
Also, it is not intended that arenas be shared across goroutines. Each arena has a lock to protect against simultaneous allocations by multiple goroutines, but it would be very inefficient to actually use the same arena for multiple goroutines. Of course, that would rarely make sense anyway, since the lifetimes of objects allocated in different goroutines are likely to be quite different.
We can just add heap-less functions that allocate all objects on stacks and panic on heap-escaping since arenas are bounded to goroutines.
Arena allocated objects need to be able to be returned back to the arena root without issue which is escape from the method that allocated it (and anyway, prevents it from living on the stack)
Note, 2023-01-17. This proposal is on hold indefinitely due to serious API concerns. The GOEXPERIMENT=arena code may be changed incompatibly or removed at any time, and we do not recommend its use in production.
Proposal: arena: new package providing memory arenas
Author(s): Dan Scales (with input from many others)
Last updated: 2022-2-22
Discussion at https://golang.org/issue/51317
Abstract
We propose implementing memory arenas for Go. An arena is a way to allocate a set of memory objects all from a contiguous region of memory, with the advantage that the allocation of the objects from the arena is typically more efficient than general memory allocation, and more importantly, the objects in the arena can all be freed at once with minimal memory management or garbage collection overhead. Arenas are not typically implemented for garbage-collected languages, because their operation for explicitly freeing the memory of the arena is not safe and so does not fit with the garbage collection semantics. However, our proposed implementation uses dynamic checks to ensure that an arena free operation is safe. The implementation guarantees that, if an arena free operation is unsafe, the program will be terminated before any incorrect behavior happens. We have implemented arenas at Google, and have shown savings of up to 15% in CPU and memory usage for a number of large applications, mainly due to reduction in garbage collection CPU time and heap memory usage.
Background
Go is a garbage-collected language. Application code does not ever explicitly free allocated objects. The Go runtime automatically runs a garbage-collection algorithm that frees allocated objects some time after they become unreachable by the application code. The automatic memory management simplifies the writing of Go applications and ensures memory safety.
However, large Go applications spend a significant amount of CPU time doing garbage collection. In addition, the average heap size is often significantly larger than necessary, in order to reduce the frequency at which the garbage collector needs to run.
Non-garbage-collected languages also have significant memory allocation and de-allocation overhead. In order to deal with complex applications where objects have widely varying lifetimes, non-garbage-collected languages must have a general-purpose heap allocator. Because of the differing sizes and lifetimes of the objects being allocated, such an allocator must have fairly complex code for finding memory for a new object and dealing with memory fragmentation.
One approach to reducing the allocation overhead for non-garbage-collected languages is region-based memory management, also known as arenas. The idea is that applications sometimes follow a pattern where a code segment allocates a large number of objects, manipulates those objects for a while, but then is completely done with those objects, and so frees all (or almost all) of the objects at roughly the same time. The code segment may be allocating all the objects to compute a result or provide a service, but has no need for any of the objects (except possibly a few result objects) when the computation is done.
In such cases, region-based memory allocation using an arena is useful. The idea is to allocate a large region of memory called an arena at the beginning of the code segment. The arena is typically a contiguous region, but may be extensible in large chunk sizes. Then all the objects can be allocated very efficiently from the arena. Typically, the objects are just allocated consecutively in the arena. Then at the end of the code segment, all of the allocated objects can be freed with very low overhead by just freeing the arena. Any result object that is intended to be longer-lived and last past the end of the code segment should not be allocated from the arena or should be fully copied before the arena is freed.
Arenas have been found to be useful for a number of common programming patterns, and when applicable, can reduce memory management overhead in non-garbage collected languages. For instance, for a server serving memory-heavy requests, each request is likely independent, so most or all of the objects allocated while serving a particular request can be freed when the request has been fulfilled. Therefore, all the objects allocated during the request can be allocated in an arena, and then freed all at once at the completion of the request.
In a related vein, arenas have been useful for protocol buffer processing, especially when unmarshalling the wire format into the in-memory protocol message object. Unmarshalling a message's wire format to memory can create many large objects, strings, arrays, etc., because of the complexity of messages and the frequent nesting of sub-messages inside other messages. A program may often unmarshal one or more messages, make use of the in-memory objects for a period of time, and then be done with those objects. In this case, all of the objects created while unmarshalling the message(s) can be allocated from an arena and freed all at once. The C++ protocol buffer documentation provides an example of using arenas. Arenas may similarly be useful for other kinds of protocol processing, such as decoding JSON.
We would like to get some of the benefits of arenas in the Go language. In the next section, we propose a design of arenas that fits with the Go language and allows for significant performance benefits, while still ensuring memory safety.
Note that there are many applications where arenas will not be useful, including applications that don't do allocation of large amounts of data, and applications whose allocated objects have widely varying lifetimes that don't fit the arena allocation pattern. Arenas are intended as a targeted optimization for situations where object lifetimes are very clear.
Proposal
We propose the addition of a new
arena
package to the Go standard library. The arena package will allow the allocation of any number of arenas. Objects of arbitrary type can be allocated from the memory of the arena, and an arena automatically grows in size as needed. When all objects in an arena are no longer in use, the arena can be explicitly freed to reclaim its memory efficiently without general garbage collection. We require that the implementation provide safety checks, such that, if an arena free operation is unsafe, the program will be terminated before any incorrect behavior happens.For maximum flexibility, we would like the API to be able to allocate objects and slices of any type, including types that can be generated at run-time via reflection.
We propose the following API:
The application can create an arbitrary number of arenas using
arena.New
, each with a different lifetime. An object with a specified type can be allocated in a particular arena usinga.New
, wherea
is an arena. Similarly, a slice with a specified element type and capacity can be allocated from an arena usinga.NewSlice
. Because the object and slice pointers are passed via an empty interface, any type can be allocated. This includes types that are generated at run-time via thereflect
library, since areflect.Value
can be converted easily to an empty interface.The application explicitly frees an arena and all the objects allocated from the arena using
a.Free
. After this call, the application should not access the arena again or dereference a pointer to any object allocated from this arena. The implementation is required to cause a run-time erro and terminate the Go program if the application accesses any object whose memory has already been freed. The associated error message should indicate that the termination is due to access to an object in a freed arena. In addition, the implementation must cause a panic or terminate the Go program ifa.New
ora.NewSlice
is called aftera.Free
is called.a.New
anda.NewSlice
should also cause a panic if they are called with an argument which is not the correct form (**T
fora.New
and*[]T
fora.NewSlice
).Here is some sample code as an example of arena usage:
There may be an implementation-defined limit, such that if the object or slice requested by calls to
a.New
ora.NewSlice
is too large, the object cannot be allocated from the arena. In this case, the object or slice is allocated from the heap. If there is such an implementation-defined limit, we may want to have a way to expose the limit. We’ve listed it as one of the possible metrics mentioned in the “Open Issues” section. An alternate API would be to not allocate the object or slice if it is too large and instead leave the pointer arguments unchanged. This alternate API seems like it would be more likely to lead to programming mistakes, where the pointer arguments are not properly checked before being accessed or copied elsewhere.For optimization purposes, the implementation is allowed to delay actually freeing an arena or its contents. If this optimization is used, the application is allowed to proceed normally if an object is accessed after the arena containing it is freed, as long as the memory of the object is still available and correct (i.e. there is no chance for incorrect behavior). In this case, the improper usage of
arena.Free
will not be detected, but the application will run correctly, and the improper usage may be detected during a different run.The above four functions are the basic API, and may be sufficient for most cases. There are two other API calls related to strings that are fairly useful. Strings in Go are special, because they are similar to slices, but are read-only and must be initialized with their content as they are created. Therefore, the
NewSlice
call cannot be used for creating strings.NewString
below allocates a string in the arena, initializes it with the contents of a byte slice, and returns the string header.In addition, a common mistake with using arenas in Go is to use a string that was allocated from an arena in some global data structure, such as a cache, which that can lead to a run-time exception when the string is accessed after its arena is freed. This mistake is understandable, because strings are immutable and so often considered separate from memory allocation. To deal with the situation of a string whose allocation method is unknown,
HeapString
makes a copy of a string using heap memory only if the passed-in string (more correctly, its backing array of bytes) is allocated from an arena. If the string is already allocated from the heap, then it is returned unchanged. Therefore, the returned string is always usable for data structures that might outlast the current arenas.Of course, this issue of mistakenly using an object from an arena in a global data structure may happen for other types besides strings, but strings are a very common case for being shared across data structures.
We describe an efficient implementation of this API (with safety checks) in the "Implementation" section. Note that the above arena API may be implemented without actually implementing arenas, but instead just using the standard Go memory allocation primitives. We may implement the API this way for compatibility on some architectures for which a true arena implementation (including safety checks) cannot be implemented efficiently.
Rationale
There are a number of possible alternatives to the above API. We discuss a few alternatives, partly as a way to justify our above choice of API.
Removing Arena Free
One simple adjustment to the above API would be to eliminate the arena
Free
operation. In this case, an arena would be freed automatically only by the garbage collector, once there were no longer any pointers to the arena itself or to any objects contained inside the arena. The big problem with not having aFree
operation is that arenas derive most of their performance benefit from more prompt reuse of memory. Though the allocation of objects in the arena would be slightly faster, memory usage would likely greatly increase, because these large arena objects could not be collected until the next garbage collection after they were no longer in use. This would be especially problematic, since the arenas are large chunks of memory that are often only partially full, hence increasing fragmentation. We did prototype this approach where arenas are not explicitly freed, and were not able to get a noticeable performance benefit for real applications. An explicitFree
operation allows the memory of an arena to be reused almost immediately. In addition, if an application is able to use arenas for almost all of its allocations, then garbage collection may be mostly unneeded and therefore may be delayed for quite a long time.APIs that directly return the allocated objects/slices
An alternate API with similar functionality, but different feel, would replace
(*Arena).New
and(*Arena).NewSlice
with the following:An example of usage would be:
This API potentially seems simpler, since it returns the allocated object or slice directly, rather than requiring that a pointer be passed in to indicate where the result should be stored. This allows convenient use of Go’s idiomatic short variable declaration, but does require type assertions to convert the return value to the correct type. This alternate API specifies the types to be allocated using
reflect.Type
, rather than by passing in an interface value that contains a pointer to the required allocation type. For applications and libraries that already work on many different types and use reflection, specifying the type usingreflect.Type
may be convenient. However, for many applications, it may seem more convenient to just pass in a pointer to the type that is required.There is an efficiency distinction in the
NewSlice
call with the two choices. In theNewSlice
API described in the "Proposal" section, the slice header object is already allocated in the caller, and only the backing element array of the slice needs to be allocated. This may be all that is needed in many cases, and hence more efficient. In the new API in this section, theSlice
call must allocate the slice object as well in order to return it in the interface, which causes extra heap or arena allocation when they are often not needed.Another alternative for
a.New
is to pass in a pointer to type T and return a pointer to type T (both as empty interfaces):An example use of this API call would be:
intPtr := a.New((*int)(nil)).(*int)
. Although this also allows the use of short variable declarations and doesn’t require the use of reflection, the rest of the usage is fairly clunky.Simple API using type parameterization (generics)
We could have an optional addition to the API that uses type parameterization to express the type to be allocated in a concise and direct way. For example, we could have generic
NewOf
andNewSliceOf
functions:Then we could allocate objects from the arena via code such as:
We don’t think these generic variants of the API can completely replace the suggested methods above, for two reasons. First, the
NewOf
function can only allocate objects whose type is specified at compile-time. So, it cannot satisfy our goal to support allocation of objects whose type is computed at run-time (typically via thereflect
library). Second, generics in Go are just arriving in Go 1.18, so we don’t want to force users to make use of generics before they are ready.Compatibility
Since this API is new, there is no issue with Go compatibility.
Implementation
In order to fit with the Go language, we require that the semantics of arenas in Go be fully safe. However, our proposed API has an explicit arena free operation, which could be used incorrectly. The application may free an arena A while pointers to objects allocated from A are still available, and then sometime later attempt to access an object allocated from A.
Therefore, we require that any implementation of arenas must prevent improper accesses without causing any incorrect behavior or data corruption. Our current implementation of the API gives a memory fault (and terminates the Go program) if an object is ever accessed that has already been freed because of an arena free operation.
Our current implementation performs well and provides memory allocation and GC overhead savings on the Linux amd64 64-bit architecture for a number of large applications. It is not clear if a similar approach can work for 32-bit architectures, where the address space is much more limited.
The basic ideas for the implementation are as follows:
A
uses a distinct range in the 64-bit virtual address spaceA.Free
unmaps the virtual address range for arenaA
A
still exists and is dereferenced, it will get a memory access fault, which will cause the Go program to terminate. Because the implementation knows the address ranges of arenas, it can give an arena-specific error message during the termination.So, we are ensuring safety by always using a new range of addresses for each arena, in order that we can always detect an improper access to an object that was allocated in a now-freed arena.
The actual implementation is slightly different from the ideas above, because arenas grow dynamically if needed. In our implementation, each arena starts as a large-size "chunk", and grows incrementally as needed by the addition of another chunk of the same size. The size of all chunks is chosen specifically to be 64 MB (megabytes) for the current Go runtime on 64-bit architectures, in order to make it possible to recycle heap meta-data efficiently with no memory leaks and to avoid fragmentation.
The address range of these chunks do not need to be contiguous. Therefore, when we said above that each arena A uses a distinct range of addresses, we really meant that each chunk uses a distinct range of addresses.
Each chunk and all the objects that it contains fully participate in GC mark/sweep until the chunk is freed. In particular, as long as a chunk is part of an arena that has not been freed, it is reachable, and the garbage collector will follow all pointers for each object contained in the chunk. Pointers that refer to other objects contained in the chunk will be handled very efficiently, while pointers to objects outside the chunk will be followed and marked normally.
The implementation calls
SetFinalizer(A, f)
on each arena A as it is allocated, wheref
callsA.Free
. This ensures that an arena and the objects allocated from it will eventually be freed if there are no remaining references to the arena. The intent though is that every arena should be explicitly freed before its pointer is dropped.Because unmapping memory is relatively expensive, the implementation may continue to use a chunk for consecutively allocated/freed arenas until it is nearly full. When an arena is freed, all of its chunks that are filled up are immediately freed and unmapped. However, the remaining part of the current unfilled chunk may be used for the next arena that is allocated. This batching improves performance significantly.
Because of the large 64-bit address space, our prototype implementation has not required reusing the virtual addresses for any arena chunks, even for quite large and long-running applications. However, the virtual addresses of most chunks can eventually be reused, since there will almost always be no more reachable pointers to anywhere in the chunk. Since the garbage collector sees all reachable pointers, it can determine when an address range can be reused.
The implementation described above demonstrates that it is possible to implement the Arena API for 64-bit architectures with full safety, while still providing performance benefits. Many other implementations are possible, and some may be tuned for other types of usage. In particular, because of the 64 MB chunk size, the above implementation may not be useful for applications that need to create a large number of arenas that are live at the same time (possibly because of many concurrent threads). It is probably most appropriate that there should only be a few to 10's of arenas in use at any one time. Also, it is not intended that arenas be shared across goroutines. Each arena has a lock to protect against simultaneous allocations by multiple goroutines, but it would be very inefficient to actually use the same arena for multiple goroutines. Of course, that would rarely make sense anyway, since the lifetimes of objects allocated in different goroutines are likely to be quite different.
Open issues
Another possibility in the design space is to implement the API described in the "Proposal" section, but without the safety checks, or with an option to disable the safety checks. The idea here is that the performance savings from the use of arenas can be increased by doing an implementation that doesn't have safety guarantees. As compared to the implementation described above, we can avoid the mapping and unmapping overhead, and reuse the memory of an arena much more quickly (and without OS involvement) when it is freed. We have done a prototype of such an implementation, which we call "unsafe arenas". We have seen an additional 5-10% improvement in performance in some cases when using unsafe arenas rather than our safe arena implementation. However, we feel very strongly that arenas in Go need to be safe. We do not want the use of arenas to lead to memory bugs that may be very hard to detect and debug, and may silently lead to data corruption. We think that it is better to continue to optimize the implementation of safe arenas, rather than trying to support unsafe arenas.
It would be useful to have some run-time metrics associated with arenas. The desired metrics will depend somewhat on the final API, so we have not yet tried to decide the exact metrics that will cover the application needs. However, here are some metrics which might be useful:
Another open issue is whether arenas can be used for allocating the elements of a map. This is possible, but it is not clear what a good API would be. Also, there might be unusual cases if the arena used for the main map object is different from the arena used to allocate new elements of the map. With generics arriving in Go 1.18, generic maps (or hash tables) can now be implemented in user libraries. So, there could be a user-defined generic map implementation that allows optionally specifying an arena for use in allocating new elements. This might be the best solution, since that would allow for greater flexibility than adjusting the semantics of the built-in
map
type.Protobuf unmarshalling overheads
As noted above, arenas are often quite useful for reducing the allocation and GC overhead associated with the objects that are created as a protobuf message is being unmarshaled. We have prototyped changes to the protobuf package which allow for providing an arena as the allocation option for objects created during unmarshalling. This arrangement makes it quite easy to use arenas to reduce the allocation and GC overhead in applications that make heavy use of protobufs (especially unmarshalling of large protobufs). If the arena proposal is accepted and implemented in Go, then it would make sense to extend the protobuf package to provide such an arena allocation option.