fitzgen / bumpalo

A fast bump allocation arena for Rust
https://docs.rs/bumpalo
Apache License 2.0
1.42k stars 112 forks source link

Collections compatibility with Serde #63

Open rw opened 4 years ago

rw commented 4 years ago

Would it be feasible to have the bumpalo Vec type work with serde? That way we can handle deserializing variable length sequences of references to borrowed data. That would be a big improvement over using the global allocator.

Here are three example types. The first is not Deserialize with serde, because it requires borrowing a slice of references to strings, which is a variable-length data structure. The second is Deserialize with serde, but it causes expensive heap allocations. The third could be possible using bumpalo.

#[derive(Serialize, Deserialize)]
struct NotSerde<'a> {
    #[serde(borrow)]
    items: &'a [&'a str],
}

struct YesSerdeButRequiresHeapAllocation<'a> {
    #[serde(borrow)]
    items: Vec<&'a str>,
}

struct YesSerdeWithBumpalo<'a> {
    #[serde(borrow)]
    items: bumpalo::collections::Vec<'a, &'a str>,
}
fitzgen commented 4 years ago

A new "serde" feature seems reasonable to me. When enabled, it would turn on serialize and deserialize impls for everything in bumpalo::collections. I think we would need to implement them by hand (or at least some of them) so I think it makes sense to avoid enabling the derive feature of serde in bumpalo.

zakarumych commented 4 years ago

Note that you'd have to implement only serde::de::DeserializeSeed to provide reference to Bump instance.

zopsicle commented 9 months ago

It should be noted that the Deserialize impl for Vec cannot always use with_capacity_in, because SeqAccess::size_hint may return None. This can cause excessive memory allocation when deserializing large sequences of elements that themselves allocate, depending on the Deserializer impl.

jaredh159 commented 6 months ago

@fitzgen hey, i took a whack at implementing this today, but i think the deserialize side is inherently problematic due to the recursive nature of deriving deserialize. you can use DeserializeSeed, but i think you're sort of stuck at one level, if you have some struct that has a Vec in it, or some deeper nesting, i don't think there is a way to materialize a Bump out of thin air while recursing and handle the lifetime. the serde crate seems to acknowledge this limitation, see https://github.com/serde-rs/serde/issues/1325 and dtolnay's code and comments here: https://github.com/serde-rs/serde/issues/1325.

that said, implementing the just Serialize is straightforward, and my hunch is most people using a bump allocator are interested in serialiazing out of an arena, not deserializing into one, so would you be open to a PR that lands Serialize support only? it's a little odd, but considering the inherent issues with recursive arena deserialization, this might be pretty useful on it's own and worth doing. if so, i've basically got a PR ready.

keithamus commented 6 months ago

I came to the same conclusion, I have implemented Serialize in https://github.com/fitzgen/bumpalo/pull/210 which should be good.