PoignardAzur / venial

"A very small syn"
MIT License
192 stars 9 forks source link

Listing the differences between venial and syn using a fuzzer #3

Open loiclec opened 2 years ago

loiclec commented 2 years ago

Hi!

About a year ago I wrote a crate called decent-synquote-alternative, which has basically the same goal as venial. It is not as well written though: I didn't have a clear scope for it and I wrote it in a very naive way. It is also not well tested at all. But it offers an alternative to quote! that I thought was more ergonomic and could maybe make compile times better. I used that crate to write the fuzzcheck proc macros.

Anyway, now I'm considering replacing decent-synquote-alternative with venial. It has almost every feature I want, and it seems to be better written. So I thought I'd contribute :)

I am in the process of releasing a crate called fuzzcheck_proc_macro which makes it possible to fuzz-test functions that take a TokenStream as input. I have tested the first prototype on venial. The first fuzz-test I tried is, basically:

fn test_parse_declaration(ts: proc_macro2::TokenStream) {
    let result = std::panic::catch_unwind(|| {
        crate::parse_declaration(token_stream.clone()),
    });
    let syn_result: Result<syn::DeriveInput, _> = syn::parse2(token_stream);
    match syn_result {
        Ok(derive_input) => {
            if let Ok(_) = &result {
                // OK! syn and venial can both parse the input
            } else {
                 panic!("syn can parse but venial can't");
            }
        }
        Err(_) => { }
    }
    // + same but with syn parsing `ItemFn` instead
}

So it tries to parse an arbitrary token stream and ensures that venial accepts everything that syn accepts.

I tested it on commit b09a976d44849e0c8573f5d8da4f5a8e50b35aab, and within a few minutes, I found a few examples that syn is happy to parse but venial can't:

// an inner attribute within a struct/enum
enum A {
    #![some_inner_attribute]
}
// closure expressions can be the discriminant of an enum variant, and they can contain a comma inside matching `|`. 
// venial advances to the next comma that isn't inside matching angle brackets only. 
// So it tries to parse the next variant starting at `c |`, which fails.
enum A {
     A = |b, c| { b + c }  
}
// macro in the where clause: venial thinks the content of the macro is the content of the enum
// so it tries to parse `:` as a variant
enum A where b !{ : } : { }
// (almost?) any pattern as function argument will make venial fail but not syn
fn x([x, ..]: a)
// c variadics not supported by venial
fn x(...) {}
// function parameter with no pattern, see: https://github.com/dtolnay/syn/issues/159 
fn x(u8) {}
// function with a self parameter
fn x(mut self) { }

There might be more, but I think that's all I found so far. I visualised the code coverage of the fuzz-test (using fuzzcheck-view) within venial and it is 100%. The code coverage within syn is also good, but it takes much longer to reach all relevant code. There, running the fuzzer for about an hour (at least for the first run) might be necessary.

Most of those differences are completely acceptable I think. In that case, we can write a function fn accept_input(x: &syn::DeriveInput) -> bool that returns true whenever we're okay with venial not parsing it. Then we can call this function before panicking within the fuzz-test, which ensures it keeps making progress.

A more interesting fuzz-test would try to compare the results between syn and venial and ensure that they match. That shouldn't be too complicated to write but will take a bit more time.

Let me know if you're interested in using a fuzzer for venial, then I can work a bit more on it and publish what I have on GitHub. Otherwise, I hope the list I have provided is already useful :)

Have a good day!

PoignardAzur commented 2 years ago

Oh, wow, that is exactly what I needed, thanks!

Having someone fuzz your library and pull failing examples out of nowhere feels magical. Definitely appreciated.

Some of these I was aware of (argument parsing is a non-priority for now), but I didn't expect the inner attribute, closure and macro cases. Definitely on my todo-list now. The closure case in particular seems gnarly.

A more interesting fuzz-test would try to compare the results between syn and venial and ensure that they match. That shouldn't be too complicated to write but will take a bit more time.

I'm not sure how that would work, especially since I'm not trying to match syn's features.

One fuzzing test that could be useful would be to check that quote!(parse_declaration(my_token_stream)) == my_token_stream, for arbitrary valid declarations. (Well, arbitrary struct/enums/unions, since functions are WIP and the rest isn't really covered right now).

Let me know if you're interested in using a fuzzer for venial, then I can work a bit more on it and publish what I have on GitHub.

What would that involve? It sounds like you've already done most of the work?

Anyway, if you want to make a PR adding fuzzing to venial, I'd definitely be interested in merging it.

loiclec commented 2 years ago

Oh, wow, that is exactly what I needed, thanks!

Having someone fuzz your library and pull failing examples out of nowhere feels magical. Definitely appreciated.

That's great to hear :)

I'm not sure how that would work, especially since I'm not trying to match syn's features.

I'll think a bit about it. For example, both syn and venial should agree on the number of fields that a struct has, as well as what the identifiers are. I don't know how far these checks can go.

One fuzzing test that could be useful would be to check that quote!(parse_declaration(my_token_stream)) == my_token_stream, for arbitrary valid declarations. (Well, arbitrary struct/enums/unions, since functions are WIP and the rest isn't really covered right now).

That sounds good. Although I assume that parsing the token stream to a declaration and then printing it again is lossy. So maybe it needs to be done twice? Something like:

let decl = parse_declaration(ts);
let ts2 = quote!(#decl);
let decl2 = parse_declaration(ts2);
// now check that `decl and `decl2` are consistent?
// maybe with:
let ts3 = quote!(#decl2);
assert!(ts2 == ts3); //  ?

Also, I have noticed that venial is very permissive and will accept declarations that really don't make sense. So maybe this test should only be performed when syn can parse the token stream as well.

What would that involve? It sounds like you've already done most of the work?

Mostly the work involves improving the fuzzer itself and writing a few more assertions in the fuzz test. I would also like to improve fuzzcheck’s APIs a bit.

Anyway, if you want to make a PR adding fuzzing to venial, I'd definitely be interested in merging it.

Great! I'll try to make a PR soon, even if everything isn't ready yet, so that you have a chance to see what it looks like and can offer some feedback.

loiclec commented 2 years ago

For example, one of the improvements I have had to make is to give the fuzzer the ability to detect infinite loops. There seems to be one as of commit 79587666482c2932663285e9c45be8e44b734220 for this token stream:

struct a (>)
PoignardAzur commented 2 years ago

That sounds good. Although I assume that parsing the token stream to a declaration and then printing it again is lossy. So maybe it needs to be done twice? Something like:

It shouldn't be, assuming the parsing succeeds.

The only case where there may be loss is when parsing doc comments (eg /// this is an inline doc comment), which are converted to attributes (though I'm not sure when that takes place).

But aside from that (and whitespace and comments, but those don't appear in a TokenStream), parsing-then-quoting should be a no-op. If it isn't, that's a bug.

Also, I have noticed that venial is very permissive and will accept declarations that really don't make sense.

The parser assumes that it's working for a derive macro; that means the TokenStream it receives has already been parsed and validated (if there's a parse error, the compilation is halted before calling macros).

Also, the parser allows code that's syntactically valid, even when it's semantically invalid. The rule of thumb is "if it compiles with a #[cfg(FALSE)], it's allowed". That includes some future syntax, and some syntax that's just nonsensical, eg:

#[cfg(FALSE)]
enum Foobar {
    pub A,
    pub B,
    pub C,
}

In any case, that doesn't change the base invariant: quote!(parse(...)) must be a no-op.

For example, one of the improvements I have had to make is to give the fuzzer the ability to detect infinite loops. There seems to be one as of commit https://github.com/PoignardAzur/venial/commit/79587666482c2932663285e9c45be8e44b734220

Ouch.

PoignardAzur commented 2 years ago
// an inner attribute within a struct/enum
enum A {
    #![some_inner_attribute]
}

That code refuses to parse, actually. I'm tempted to open a syn issue.

All the other snippets parse.

PoignardAzur commented 2 years ago

Alright, I've fixed the infinite loop; it now panics in those cases, for the reasons explained above (venial assumes input is valid).

I've added panicking tests for all the inputs you've given me. I'll fix them over time.

PoignardAzur commented 8 months ago

Since I'm planning to release a 1.0 soon and then put venial in maintenance mode, I might pick this up again. I need to look into cargo-fuzzcheck and see how I can run it on my computer.