dtolnay / syn

Parser for Rust source code
Apache License 2.0
2.84k stars 309 forks source link

lookahead1 is_empty, or Peek for Nothing #1550

Open ijackson opened 9 months ago

ijackson commented 9 months ago

Ideally I would be able to say

    let la = input.lookahead1();
    if la.peek(...) {
        ...
    } else if la.peek(syn::parse::Nothing) {
        OR
    } else if la.is_empty() {
        handle the empty input case
    } else {
        return Err(la.error())
    }

But Lookahead doesn't have an is_empty() method. And .peek doesn't work, I think just because Nothing isn't Copy.

I can use input.is_empty() but of course that doesn't note in the error message that empty input was permissible.

vidhanio commented 9 months ago

This would probably be achievable without breaking changes by adding a NothingToken(TokenMarker) type which just checks for EOF? maybe it could even be added to the Token! macro as Token![].

ModProg commented 6 months ago

This would probably be achievable without breaking changes by adding a NothingToken(TokenMarker) type which just checks for EOF? maybe it could even be added to the Token! macro as Token![].

I'd very much like something like that but, @dtolnay repeatedly argued against adding a way to peek for EOF, e.g.,

1161 and https://github.com/dtolnay/syn/issues/889.

AFAICT it would be trivial for syn to add an EOF token that can be peeked, I had proposed a way here, https://github.com/dtolnay/syn/issues/1161#issuecomment-1792548261, but doing that outside of syn is obviously out of any stability guarantees, as it uses private traits (as a matter of fact the implementation I proposed no longer compiles).

I guess another solution would be to allow people to implement Peek themselves, but AFAIK that is not a desired thing right now.

dtolnay commented 6 months ago

Yeah peeking is simply not appropriate for EOF. peek2 and especially peek3 are already a problematic API, and shoehorning EOF into that exacerbates their issues significantly. I am open to exposing peek in a better way but just adding a trait impl is not it.

Peek is normally used for ascertaining whether a particular pattern of tokens would successfully parse if parsed from the input in a given order: if input.peek(Token![async]) && (input.peek2(token::Brace) || input.peek2(Token![move]) && input.peek3(token::Brace)). If this expression evaluates to true, then either parsing async followed by brace, or async followed by move followed by brace, must succeed. The input token tree including None-delimited groups can be async {} or «async {}» or «async» {} or async «{}» or «async» «{}» or ««async» {}» or ….

EOF is a different question than this because asking whether Cursor::eof can be true in a particular location is not a meaningful thing in the presence of None-delimited groups (consider input.peek(Token![async]) && input.peek2($eof) with something like «async» {}), and asking whether Nothing can be parsed at a particular location is especially not meaningful because it always can be parsed anywhere.

Ultimately, forking and parsing the tokens in the desired order is the foolproof way to ascertain whether they will parse successfully if parsed in that order.

ijackson commented 6 months ago

I'm puzzled. You seem to be explaining that if there was a peek($eof), the forms peek2($eof) and peek3($eof) would go wrong.

But that doesn't seem to apply to input.is_empty()? Is there some reason why lookahead1.is_empty() would be a bad idea?

Right now I do this:

let la1 = input.lookahead1() {
if la1.peek(Ident) { /* stuff */ }
else if la1.peek(Lifetime) { /* stuff */ }
else if input.is_empty() { /* use default data or something */ }
else { Err(la1.error()) }

But this produces inaccurate error messages because the Lookahead1 "didn't see" me test for emptiness.

Lookahead1::is_empty would solve my problem, I think.

dtolnay commented 6 months ago

My last comment was in response to ModProg's ping and "I'd very much like something like that" (NothingToken) and "AFAICT it would be trivial for syn to add".

lookahead1.is_empty() is fine as long as it correctly suggests the right closing delimiter from the surrounding context, instead of just putting "or end of input" into error messages. Similar to these rustc errors:

error: expected one of `.`, `;`, `?`, `}`, or an operator, found `@`
 --> src/lib.rs:1:19
  |
1 | fn f() -> i32 { 1 @ }
  |                   ^ expected one of `.`, `;`, `?`, `}`, or an operator

error: expected one of `,`, `.`, `;`, `?`, `]`, or an operator, found `@`
 --> src/lib.rs:2:25
  |
2 | fn g() -> [i32; 1] { [1 @] }
  |                         ^ expected one of `,`, `.`, `;`, `?`, `]`, or an operator
ModProg commented 6 months ago

My last comment was in response to ModProg's ping and "I'd very much like something like that" (NothingToken) and "AFAICT it would be trivial for syn to add".

Sorry, my comment was probably snarkier than it had to be, I did not consider non-delimited groups. If I understand that correctly, this would always be a problem while parsing, right? If I have «|async» fn (» being the none delimited group, and | the current parse position), peek2(Token![fn]) would return false, right? I wonder if a better behavior would be to ignore the non-delimited group, though there are situations where correct handling of non-delimited groups is relevant (IIRC that is mainly for precedence in expressions).

I think having an Eof token that behaves in peek like Token![fn] should not be any more problematic. Though as Eof would not be a useful in an error message, an Eof token is probably not part of a solution for the lookahead issue.

ModProg commented 6 months ago

lookahead1.is_empty() is fine as long as it correctly suggests the right closing delimiter from the surrounding context, instead of just putting "or end of input" into error messages. Similar to these rustc errors:

Maybe is_empty(delimiter) could be an option, where delimiter needs to be able to convey })] and something like end of macro input as a proc-macro doesn't know which delimiter started it.

dtolnay commented 6 months ago

If I have «|async» fn (» being the none delimited group, and | the current parse position), peek2(Token![fn]) would return false, right?

I think it's impossible to have a ParseStream with the tokens you showed and with the cursor at the position you showed.

The issue is rather |«async» fn in which the naïve eof token that has been thrown around (with fn peek(cursor) -> bool { cursor.eof() }) would report peek2(eof) is true.

ModProg commented 6 months ago

The issue is rather |«async» fn in which the naïve eof token that has been thrown around (with fn peek(cursor) -> bool { cursor.eof() }) would report peek2(eof) is true.

After looking at both the peek2 and Cursor impls, I understand what you mean, peek2 will enter a non-delimited group transparently, and Cursor::ident etc. will exit non-delimited groups automatically, while Cursor::eof does not.

So to support a peek like that, one would need to handle non-delimited groups explicitly (I think one would need to call ignore_none() first).

dtolnay commented 3 months ago

1625 solved the problematic case described in https://github.com/dtolnay/syn/issues/1550#issuecomment-2017018788.

Separately, #1680 + #1681 added a way that Lookahead1 will be able to report the correct closing delimiter at the end of a group per https://github.com/dtolnay/syn/issues/1550#issuecomment-2016891573.

I think this is ready to move forward if someone can provide realistic example code for the docs. Preferably 2 examples: one with peek2, and one with lookahead1 where we can demonstrate the resulting error message.