Closed overlookmotel closed 6 months ago
This could make sense, but I'm not sure your use case actually requires this functionality?
One circumstance where this would be useful is Rust-Javascript interop, where it'd enable re-using the same Bump for each call into Rust, rather than calling into the global allocator to create a fresh one each time.
You should be able to call bump.reset()
and it will retain the largest chunk that it allocated (freeing the smaller chunks). This means you should reach a steady state pretty quickly.
The other concern is re-entry: JS calls Wasm calls JS calls Wasm again. In this scenario you explicitly don't want to be reusing the same memory because you could trample on an earlier call frame's data if you did Bump::from_raw_parts
on a single allocation that is always reused. The former bump.reset()
solution avoids this via the type system: it avoids unsafe code and forces you to prove uniqueness of borrows in order to call bump.reset()
.
Sorry I've been so slow to come back to you. Your response on #234 has kicked off a lot of thoughts (and I've had to read up a lot to get to a point of understanding).
To give some context, this is what I'm trying to do: https://github.com/oxc-project/oxc/issues/2409
That's very long, so to boil it down a bit:
Oxc parser is a JavaScript parser written in Rust. We're working on NodeJS bindings for it, using napi-rs. JS ASTs are huge, heavily nested structs, and the serialization/deserialization required to transfer them from Rust to JS (currently via serde JSON) is a massive performance hit - roughly 10x the cost of the parsing itself.
Oxc stores the entire AST in a bumpalo Bump
. So I have a POC which instead pulls the chunks out of the Bump
and sends them to JS as buffers via napi-rs. This involves no memory copying, so is very fast. Then there's a deserializer on JS side which understands the memory layout of the native Rust types, and can convert them to JS objects.
It works, and performance is good, but there's a couple of sticking points:
The solution I have in mind is:
Bump
using that memory block.ManuallyDrop
so it doesn't free the memory on drop.Bump
.Memory
object.Memory
for the next call.NB: Not actually using WASM at this point, but WebAssembly.Memory
is useful as it happens to have the 4 GiB alignment property that I'm after.
The only missing element for this to work is step (3), which would need a Bump::from_raw_parts
-type API.
It may well be that in long term Oxc should look at replacing bumpalo with a custom arena allocator tuned for our particular use case. But for purposes of developing a prototype, it'd be preferable to avoid doing that work upfront. I'm aware writing an allocator is no small task!
If the above makes sense, and you'd be willing to receive a PR to add Bump::from_raw_parts
, I'd be happy to work that up.
If you see problems with what I'm suggesting, or a better way to tackle the problem, would also very much appreciate your thoughts.
Orthogonal to this discussion but:
NB: Not actually using WASM at this point, but WebAssembly.Memory is useful as it happens to have the 4 GiB alignment property that I'm after.
FWIW, unless this is a Node-specific property, there is no guarantee that Wasm memory is 4GiB-aligned. The maximum size of a 32-bit Wasm memory is 4GiB, but that doesn't have anything to do with its alignment.
(Writing a proper, on topic reply now)
FWIW, unless this is a Node-specific property, there is no guarantee that Wasm memory is 4GiB-aligned. The maximum size of a 32-bit Wasm memory is 4GiB, but that doesn't have anything to do with its alignment.
Yes, point taken. But it appears to be reliably the case in NodeJS. I assume this is probably motivated by being able to efficiently convert 32-bit pointers in WASM into "real" 64-bit pointers.
This property may not always hold in future, and we should probably replace it later on, or provide a less efficient fallback deserializer which doesn't rely on it. But for a prototype it seems like a good starting point.
Do you need to allocate the memory that way? You can't let bumpalo
do it via Bump::with_capacity(...)
?
Because while I could imagine a Bump::into_raw
and Bump::from_raw
pair of methods, they would likely have the following constraints:
It is only generally safe to call Bump::from_raw
with the result of a previous call to Bump::into_raw
. It would assume that the given thing is a pointer to an initialized ChunkFooter
, which allows the Bump
to figure out how much capacity there is in the chunk and things like that. That is, it would not be safe to call it with a pointer to a random allocation, especially not one managed by some other system, like V8's garbage collector.
Relatedly, each chunk in a Bump
would still always use the global allocator configured in the Rust program to deallocate the chunk. So, if you gave Bump::from_raw
some chunk of memory that was allocated by V8's allocator, the Bump
would try and pass that to Rust's global allocator for deallocation.
I guess, even given those constraints, if you were careful never to actually do anything that deallocates the bump's chunk, you could make something that worked with your scheme:
ChunkFooter
into your custom memory chunk from V8ManuallyDrop<Bump>
to ensure that the drop never tries to deallocate the chunk (even when unwinding due to panics or whatever)try_alloc
family of methods to ensure that you only ever use your one chunk from V8Bump
method you ever call to make sure it won't try and allocate new chunks or deallocate your V8 chunkThis could all work, but does seem pretty fragile to me. Especially the ChunkFooter
bit. I wouldn't want to give any guarantees that implementation details this would rely on wouldn't change in new minor versions of bumpalo
. But if you pinned exact versions and audited dependency updates, it could work.
Long term, I don't want to always use the global allocator for Bump
. I want it to be parameterized over a generic type parameter A: Allocator
that defaults to the global allocator. In this future, you would be able to implement a V8WasmMemory
allocator or whatever and use this with Bump
. This should work well and would be a whole lot less fragile. But this all depends on a stable allocator trait in std
.
But now that I say that, we could prototype this scheme with the allocator_api2
crate, which provides a polyfill for the allocator trait that will eventually go into std
in some form. If we were able to get the prototype to a place where
allocator_api2
dependency/feature by defaultBump
how it is now, and there is not some new type parameter that isn't used at all in this configurationallocator_api2
cargo feature today#[cfg(...)]
sthen I could consider merging it. But I really want to emphasize that this would have be structured to be as uninvasive as possible and not affect all of bumpalo
's existing users before I would consider that.
Is this something you are interested in exploring?
Thanks for giving this consideration and replying at length. Really appreciate it.
Do you need to allocate the memory that way? You can't let
bumpalo
do it viaBump::with_capacity(...)
?
No I don't need to, but...
Main problem is that Bump::with_capacity
doesn't allow requesting a specific alignment. So I can't get the 4 GiB alignment I would like.
Also:
In my scheme, JS is the ultimate "driver" of the whole process, so in my mind it makes sense for JS to own the memory, and rely on JS's garbage collector to free it. There's also a technical limitation: Node-API (the NodeJS-native interop layer) also has a very annoying problem with objects which are created in native code. If you allocate an object/block of memory in Rust, and pass a reference to it to JS, then once all references to it get garbage collected, JS calls the finalizer in Rust to say "I'm done with it, you can free it now." BUT... actually it doesn't happen like that. The finalizer in fact does not get called until the next turn of JS's event loop, which can be much later if the JS code is all synchronous. Therefore user-facing JS APIs which make large allocations in Rust are a footgun - makes it very easy to exhaust the system's memory by calling them in a synchronous loop.
Secondly, some of the JS-side deserializer may later be converted to WASM for speed, so using a WebAssembly.Memory
from the start would leave the door open to experiment with that.
Because while I could imagine a
Bump::into_raw
andBump::from_raw
pair of methods, they would likely have the following constraints:
I was imagining Bump::from_raw
a bit differently. I envisaged it as taking the raw block of memory provided and initializing a chunk footer itself. Basically, I imagined Bump::with_capacity
but instead of calling alloc
internally, it would use the pointer/size pair provided by the caller.
Caller would be responsible for ensuring the pointer is aligned correctly, and that the Bump
doesn't get dropped.
Bump::into_raw
isn't required for my use case, because the memory is borrowed from JS. But in any case I thought iter_allocated_chunks_raw
would largely fulfil this function.
On the requirements:
- ✨ somehow ✨ write a
ChunkFooter
into your custom memory chunk from V8.
See above. I'd hope from_raw
would do this for me.
- Use
ManuallyDrop<Bump>
to ensure that the drop never tries to deallocate the chunk (even when unwinding due to panics or whatever)
Yes, this is a pain, but necessary.
- Only use the
try_alloc
family of methods to ensure that you only ever use your one chunk from V8- Audit any other
Bump
method you ever call to make sure it won't try and allocate new chunks or deallocate your V8 chunk
I was imagining creating a 4 GiB memory block to start with, and setting a maximum size of 4 GiB on the Bump with set_allocation_limit
. I thought this would protect against any further allocations/re-allocations occurring.
We can accept a hard OOM failure if exhaust the 4 GiB allocation. In practice, no sane JavaScript file would produce an AST anything approaching this size. And Oxc also runs in WASM where 4 GiB is the limit anyway.
This could all work, but does seem pretty fragile to me. Especially the
ChunkFooter
bit. I wouldn't want to give any guarantees that implementation details this would rely on wouldn't change in new minor versions ofbumpalo
. But if you pinned exact versions and audited dependency updates, it could work.
I totally agree that trying to manually construct a ChunkFooter
is a terrible idea! In any case, going that way wouldn't need any API support from bumpalo - if going that far, might as well take the next step of just creating the required byte pattern, and transmute
it to a Bump
.
Is this something you are interested in exploring?
Honest answer: maybe.
At this point, I'm at the proof-of-concept stage with the AST transfer mechanism, so my priority is to knock something together fairly quickly, and just see how it works. If it turns out the mechanism I've outlined is the most performant way, then yes I might will be interested to then "do it properly", and tackle the allocator_api2
approach you've outlined.
But for now I think I'd be better off either temporarily forking bumpalo or doing the horrid DIY ChunkHeader
and transmute
thing.
Or, if you'd be willing to accept a PR for from_raw
along the lines of what I've suggested above, obviously I'd be happy to work up a PR. But from our conversation, I can see that's probably not the right way to do it, so would totally understand if you don't want to include that in bumpalo's API.
I was imagining creating a 4 GiB memory block to start with, and setting a maximum size of 4 GiB on the Bump with
set_allocation_limit
. I thought this would protect against any further allocations/re-allocations occurring.
Ah yes, it would. The allocations limit stuff slipped my mind.
Or, if you'd be willing to accept a PR for
from_raw
along the lines of what I've suggested above, obviously I'd be happy to work up a PR. But from our conversation, I can see that's probably not the right way to do it, so would totally understand if you don't want to include that in bumpalo's API.
Yeah, I'd rather go the custom allocator route.
OK, fair enough. I'll go ahead with one of the hacky methods for now and see if it works. May return to this and try to implement custom allocators (within the requirements you've given) further down the road.
Thanks again for walking through this with me and exploring possibilities.
Related to #100.
Would you consider addition of an unsafe method (called e.g.
from_raw_parts
) to create a Bump from an existing memory allocation?In comparison to the request in #100, the caller would have to promise that the allocation was made from the global allocator, with the alignment that Bumpalo requires, and that this allocation now becomes owned by the Bump. So then no changes required to Bumpalo's internals.
One circumstance where this would be useful is Rust-Javascript interop, where it'd enable re-using the same Bump for each call into Rust, rather than calling into the global allocator to create a fresh one each time.