TimelyDataflow / abomonation

A mortifying serialization library for Rust
MIT License
317 stars 30 forks source link

Is it okay to implement Abomonation for both T and &T? #27

Open HadrienG2 opened 4 years ago

HadrienG2 commented 4 years ago

From the point of view of abomonation's core semantics, there is nothing wrong with providing implementations of the Abomonation trait for both a type T and a reference to it &T. Basically, the implementation for &T works exactly like that of Box<T> in abomonation's current master branch.

Such implementations would be useful for high-level users of Abomonation, who stick with derives, encode, decode and measure, because they would allow safely auto-deriving Abomonation for more types. Something which, as a matter of fact, I ended up wanting for my project.

However, and that's the reason why I'm opening this issue before submitting my changes as a PR, there is also a significant ergonomic cost to doing so for any low-level user of Abomonation who calls into the trait directly.

If Abomonation is implemented for both T and &T, then anyone who uses the Abomonation trait directly instead of going through encode, decode and measure must be super-careful, because method call syntax of the x.extent() kind becomes a deadly trap that can very easily trigger the wrong Abomonation impl through auto-Deref magic.

Instead, one must get into the habit of only calling Abomonation trait method via U::extent(&x) syntax, or if type U is unknown go for the somewhat less safe compromise of Abomonation::extent(&x).

Is this a trade-off that we can tolerate for the sake of having more Abomonation impls?

HadrienG2 commented 4 years ago

(As an aside, because Rust references are marked dereferenceable at the LLVM level, implementing Abomonation for them will be dangerous until #17 is resolved)

frankmcsherry commented 4 years ago

I'm a bit scrambled at the moment, but want to pop in with a quick comment.

It is generally unsafe to implement Abomonation for any reference type, because the lifetime of the reference needs to be bound to the lifetime of the borrowed &mut [u8]. Otherwise, you could mint a reference whose lifetime outlives that of the binary data, and badness would ensue.

Serde has the same problem, and the result seems to be to have Deserialize contain a lifetime argument, and only implement Deserialize<'a> for &'a T types. The decode method would have a lifetime in the &mut 'a self argument to ensure that self lives at least as long.

I'm sorry for not popping in earlier with this. Very behind your rate of production.

Serde has the same problem, and the answer

frankmcsherry commented 4 years ago

I have an AbomonationRef<'a> trait locally, though I haven't figured out who would use it yet. I'm happy to share that out if the reference stuff is a specific goal.

HadrienG2 commented 4 years ago

Yup, I could use a look at that!

(Technically, I only need &str, but I figured out that I could just go the extra mile and try to solve the more general reference problem)

frankmcsherry commented 4 years ago

On the road atm, but will get it soon!

HadrienG2 commented 4 years ago

EDIT: Fixed after cross-checking how serde does it.

Just to see if I understand, the idea would be to...

  1. Turn Abomonation into Abomonation<'bytes> where 'bytes will be used to represent the set of &'bytes mut [u8] slices from which it is valid to deserialize the abomonated type.
    • Turn most existing impl Abomonation for ConcreteT impls into impl<'bytes> Abomonation<'bytes> for ConcreteT: if a type contains no references, it can be deserialized from any slice of bytes.
    • Use something like impl<'target, 'bytes: 'target> Abomonation<'bytes> for &'target ConcreteT for reference-ish types: if a type contains references, it can only be deserialized from bytes which outlive the lifetime of those references. The reason is that abomonation will actually produce &'bytes ConcreteT, not &'target ConcreteT, and for this substitution to be valid we need &'bytes ConcreteT: &'target ConcreteT.
    • Be careful with generic Abomonation impls. I think something like impl<'a, 'bytes: 'a, T: Abomonation<'bytes>> Abomonation<'bytes> for Generic<'a, T> (generalized to any amount of T-s and 'a-s) should be enough, but that needs cross-checking.
  2. Bound Abomonation<'bytes>::exhume() to require an &'a mut [u8] where 'a: 'bytes. This asks the borrow checker to enforce the constraint that we set ourselves in step 1.
  3. Propagate this constraint to other APIs like decode() (the compiler should ask for it anyway).

As a nice side effect, this should also resolve a problem which I had when trying to express the lifetimes of my exhume_xyz helper functions in #28.

frankmcsherry commented 4 years ago

This is basically correct yes! I appear to have deleted the file, unfortunately, but you have the spirit of it. I can reconstruct it this weekend (it was very minor, just around the &str and &T implementations). I'm pretty sure I had what you have above, but it might also be that you could just do

impl<'bytes> Abomonation<'bytes> for &'bytes ConcreteT 

and skip the other lifetime (allow Rust to coerce the lifetime as it sees appropriate).

HadrienG2 commented 4 years ago

Serde's author seems unconvinced:

The 'de lifetime should not appear in the type to which the Deserialize impl applies.

- // Do not do this. Sooner or later you will be sad.
- impl<'de> Deserialize<'de> for Q<'de> {

+ // Do this instead.
+ impl<'de: 'a, 'a> Deserialize<'de> for Q<'a> {

OTOH, since the Abomonation impl is generic, it's probably okay to kill the lifetime on Abomonation::exhume<'a> and just use &'bytes [u8] directly.