Lokathor / bytemuck

A crate for mucking around with piles of bytes
https://docs.rs/bytemuck
Apache License 2.0
738 stars 80 forks source link

Derive macros are unsound #281

Open mahkoh opened 1 month ago

mahkoh commented 1 month ago

I came across https://github.com/google/zerocopy/issues/388#issuecomment-1737817682 by chance. The same applies to this crate:

type S = &'static str;

#[derive(Copy, Clone, Debug, AnyBitPattern)]
#[insert_field(x = "S")]
struct X {
}

fn main() {
    let x: &X = bytemuck::cast_ref(&[0usize, 1usize]);
    println!("{}", x.x);
}

Segfaults. Solutions might involve generating a function that asserts (by compiling) that the types seen by the proc macro are the same as the types in the final type and that the final type contains no additional fields.

Lokathor commented 1 month ago

I'm not remembering what Derive proc macros are allowed to emit. Can they create code other code besides the trait impl they're for?

Otherwise, we might have to do the other option mentioned in the zerocopy thread and bail out on all unknown attributes. BUT that would be extremely uncool.

zachs18 commented 1 month ago

Yes, derive macros can emit arbitrary items, essentially. (IIRC this is already used in bytemuck_derive, where some derive macros emit some const _: () = { some static assertion stuff }; in addition to the trait impl.)

That said, Is it documented anywhere that it is intentional for derive macros to see the "original" tokens when an item has been modified by an attribute macro? My first instinct would be to call it a Rust bug that the derive macro does not get the correct TokenStream that defines the item it is intended to be applied to.

Lokathor commented 1 month ago

i believe the exact ordering of the attributes matters, unfortunately

zachs18 commented 1 month ago
use the_macros::*;

fn main() {
    {
        #[derive(PrintTokens)]
        #[add_fields({y: u32})]
        struct Foo {
            x: u32,
        }
        print_tokens()
    }

    {
        #[add_fields({y: u32})]
        #[derive(PrintTokens)]
        struct Foo {
            x: u32,
        }
        print_tokens()
    }
}
#[add_fields({ y: u32 })] struct Foo { x: u32, }
struct Foo { x : u32, y : u32 }

Okay, the original bug report made it sound like Rust never applied attribute macros before giving the item to derive macros, but yeah it does if it is before the derive (since it applies macros outside-in) and otherwise the attribute macro is still in the input. So erroring out if there are any unknown attributes in the input (the first case here) would be one way to make this sound, perhaps with an error message telling the user to move the attribute macro above the derive. (see reply)

the_macros source ```toml # Cargo.toml [package] name = "the-macros" version = "0.1.0" edition = "2021" [lib] proc-macro = true [dependencies] proc-macro2 = "1.0.88" quote = "1.0.37" syn = { version = "2.0.82", features = ["full"] } ``` ```rs // lib.rs use proc_macro::{TokenStream}; use proc_macro2::{Literal, Ident, Group, Punct, Span, Spacing, Delimiter}; #[proc_macro_derive(PrintTokens)] pub fn derive_print_tokens(item: TokenStream) -> TokenStream { let literal = Literal::string(&item.to_string()); quote::quote!{ fn print_tokens() { println!("{}", #literal); } }.into() } #[proc_macro_attribute] pub fn add_fields(attr: TokenStream, item: TokenStream) -> TokenStream { let syn::Item::Struct(mut item) = syn::parse::(item).unwrap() else { panic!() }; let new_fields = syn::parse::(attr).unwrap(); let syn::Fields::Named(ref mut fields) = item.fields else { panic!() }; fields.named.extend(new_fields.named); quote::quote! { #item }.into() } ```
zachs18 commented 1 month ago

So erroring out if there are any unknown attributes in the input (the first case here) would be one way to make this sound, perhaps with an error message telling the user to move the attribute macro above the derive.

Actually after some testing, this won't work in general. Even ignoring all the false positives it would have, there are also false negatives: there's no way in general to differentiate between our own helper attributes for other traits (like #[zeroable(bound = "")]) and actual unknown attribute proc-macros.

false negative example ```rs use adversarial_macros::transparent; // inserts a `y: Infallible` field, e.g., or any 1-ZST with a nontrivial safety invariant use bytemuck::{Zeroable}; #[derive(Clone, Copy, Zeroable, TransparentWrapper)] #[transparent(u32)] // fine, made inert by `derive(TransparentWrapper)` #[repr(transparent)] struct Foo { x: u32 } #[derive(Clone, Copy, Zeroable)] #[transparent(u32)] // unsound false negative, adds a field after `Zeroable` derive macro runs #[repr(transparent)] struct Foo { x: u32 } ``` From inside the `Zeroable` derive-macro, we cannot know if `derive(TransparentWrapper)` is also being applied, so we don't know if `#[transparent(u32)]` is a helper attribute for `derive(TransparentWrapper)` or a call to the procmacro `adversarial_macros::transparent`.

Edit: I suppose we could get around this by only allowing the helper attributes for the specific trait being derived, which would require writing the derives separately with the helpers between them, actually no that won't work either, because the helper attributes are "inert", but are not actually stripped from the input for later derives, e.g.

example ```rs use adversarial_macros::transparent; #[derive(Debug, Clone, Copy)] #[derive(TransparentWrapper)] // comment this out for a different error #[transparent(u32)] #[derive(Zeroable)] #[repr(transparent)] struct Foo { x: u32, } ``` `derive(Zeroable)` still sees `transparent(u32)` in the input if it was used as an inert helper attribute in `derive(TransparentWrapper)` instead of being an "active" procmacro.
zachs18 commented 1 month ago

Another wrinkle: For fields, we can check the types using some const _: () = { /* do some stuff */ };checks to make sure nothing changed the field types, but for enum discriminants, I don't know if there is any way to be sure that a procmacro did not change any variant discriminants, e.g.

#[derive(Zeroable)]
#[repr(u32)]
#[adversarial_macro] // changes discriminant so none are 0
enum Foo {
  X(i32) = 0,
}

Is there even any way to detect this at compile time?

Edit: For a fieldless enum you can const _: () = assert!(Foo::X as int_ty == 0);, but for fieldful enums I don't know if there is a way to do it at compile time.

workingjubilee commented 1 month ago

That said, Is it documented anywhere that it is intentional for derive macros to see the "original" tokens when an item has been modified by an attribute macro? My first instinct would be to call it a Rust bug that the derive macro does not get the correct TokenStream that defines the item it is intended to be applied to.

It's sort of the inverse: I don't believe we have fixed the order of macro evaluation for proc macros in a way that is particularly useful or provides any real guarantees to macro authors, in any direction.