TimelyDataflow / abomonation

A mortifying serialization library for Rust
MIT License
322 stars 30 forks source link

Implement Abomonation for PhantomData #4

Closed milibopp closed 7 years ago

milibopp commented 7 years ago

PhantomData is only a zero-sized type-level marker. Therefore, it should never have an effect on serialization.

This change provides a blanket impl for the type, so that (de-)serializing it is a no-op.

frankmcsherry commented 7 years ago

This seems cool. I have to admit I'm not sure if I fully understand the consequences, though. I agree that there shouldn't be anything written or read back, but couldn't it be the case that one has a PhantomData<&'a ()> to express a lifetime requirement, and we are now going to be able to mint these up for arbitrary types "too easily"?

I'm not sure I understand what PhantomData guarantees, I guess. If it is up to the user to pick the "type", and if they pick PhantomData<&'static ()> when we serialized something else then "too bad for them" I'm not too worried as long as we don't break any memory safety (any more memory safety ;)).

Do you know more about whether this is a potential footgun? I totally agree that any type that needs a PhantomData for co-/contra-variance is currently boned wrt Abomonation, and we should do this PR immediately if there are no naughty consequences.

milibopp commented 7 years ago

Is there any way for lifetimes to survive serialization + deserialization at all?

The way I understand it, PhantomData does nothing more but allow you add type information, when you don't have other fields for that. A canonical use case would be this:

struct Foo<'a, T> {
    pointer: *const T,
    phantom: PhantomData<&'a T>
}

Even if we can construct arbitrary lifetimes that way, the PhantomData field is not the issue here. After all, it does not contain an actual pointer to invalid memory. If you have a type like that, it has an unsafe implementation anyway and that needs to be taken into consideration when implementing Abomonation. That case is probably similar to implementations for &'a str and &'a [T].

But, even if a user encodes some PhantomData<T> and decodes it as PhantomData<U> for T != U, it does not matter because there is no data contained in it that could cause the program to dereference a dangling pointer or create other kinds of invalid state. I guess it is very much like [T; 0] in that regard.

fyi, I mostly want this for cases that do not involve lifetimes to make it possibly to derive implementations automatically using abomonation_derive. Alternatively, one could annotate fields to be skipped. But this seems to be the more principled approach to me.

frankmcsherry commented 7 years ago

Is there any way for lifetimes to survive serialization + deserialization at all?

Only in the sense that we can screw them up. :)

For example, the current pile of code has commented out implementations for &'a str and &'a [T] because we can't actually ensure that the resulting types will be memory safe, because there is no relation between the 'a lifetime of the type and the &'b mut [u8] lifetime of the data backing it.

We can constrain things to work out with an Abomonation<'a> trait, but it doesn't exist yet.

So the question is less about "can we make sure we get the right lifetime" and more about "are we risking memory unsafety or other invariant violations if we provide a &'b reference to something with a longer lifetime?" I think this boils down to not knowing what PhantomData really "means", and whether a (String, PhantomData<&'a ()>) ensures anything at all about the string, or perhaps that the memory backing two such tuples with non-overlapping lifetimes cannot be aliased, or .. I have no clue, really.

Is your use case satisfied if the impl for PhantomData has a T: 'static constraint?

BTW: Totally happy to take the PR, just trying to get a log of candidate issues that might be unclear.

milibopp commented 7 years ago

Interesting… that may be a really good idea being explicit about lifetimes. It's good to talk this through. After all, we are using Rust to prevent unsafety from spreading out of our abstraction boundaries. :)

Regarding your question towards the meaning of PhantomData, I don't think there's much more to it. For those cases, where you decorate a raw pointer with a lifetime, it really is just a vehicle to end up with a safe API, that the raw pointer does not provide on its own.

But I don't think it says anything about the String in your (String, PhantomData<&'a ()>) example. The String is still an owned string that is internally consistent, that is not affected by the phantom data next to it. If anything, adding in PhantomData strictly reduces what you can do with the String. It introduces lifetime bounds, where there were none before. With the owned String, you could have done everything, but now that there is this phantom reference next to it, you are restricted to 'a. But it does not require anything about the String to be true.

I think the situation is very different from the slice situation, because a slice actually does contain data, whose lifetime is changed the way you describe. But that is just not the case for PhantomData. Does that make sense?

btw: my use case is satisfied, even when adding T: 'static. I don't think it is necessary, but I may be wrong :P So to play it safe, I'd be happy to restrict the implementation to static types.

frankmcsherry commented 7 years ago

Actually, I apologize but the 'static example constraint was wrong. What I meant (reflecting later) was "owned or shorter than 'a", which is what Abomonation<'a> intends to guarantee.

But, I think we are good to go in that it is an unsafe interface, and all of this (the whole crate) will need review at some point anyhow. :)

milibopp commented 7 years ago

Okay, that's cool. Yeah, I kind of took : 'static to mean "lifetimes not involved", but that is probably not true here. You can't express "shorter" with a lifetime bound, only "as least as long as". Looking forward to the API revision.

frankmcsherry commented 7 years ago

Do you have a use in mind? It is the sort of thing that isn't too hard to do, but I'm also not blocked on doing it (so, more of a novelty than useful wrt code I'm using at the moment). If you end up blocked on anything like this, do shout and it isn't so hard to revise.

milibopp commented 7 years ago

No, I was just thinking that it may be handy to just share slices at some point, but it is not a necessity for me.