How to deal with ambiguous parses?

wchristian commented 10 years ago

Right now PPI seems to be decidedly undecided on how to deal with code that cannot be decided confidently as to its meaning. An example as follows:

sub d { 1 };
my @c = 3 .. 6;
say 1 if d ~~ @c;

use v5.10.1;             # !
sub d () { 1 };          # !
my @c = 3 .. 6;
say 1 if d ~~ @c;

The ~~ in the last line interpreted as a single operator in both cases. However in the first case it should be two operators and only in the second case should it be parsed as the smart-match operator.

In my opinion the current handling of that code is unacceptable.

I am however unsure on how it should be handled instead and am as such fishing for general opinions on how PPI should handle ambiguous parses.

kentfredric commented 10 years ago

As I said in IRC, I think the very best you can do, is assume that there is no way to use that use v5.10.1 stanza to determine what ~~ means.

Instead, I'd suggest having some sort of grammar table that maps various syntax rules for different versions of Perl, and then placing the burden on the person calling PPI to specify what version they want to assume parsing for.

Yes, this will mean lexically scoped pragma effects will be ignored, and you'll just have to make some fair assumptions based on the value of the perl_version parameter, and then die as appropriate.

Its not perfect, but its achievable without needing pluggable dynamically self-modifying parsers and triggering lexical parse rule, and you can hopefully add that feature in later when you can work out how.

What this would mean in your example case, is both those code snippets would parse identically, but what ~~ parsed as would depend entirely on the value to PPI for its ambient perl version parameter.

Again, this is not perfect, but its a step up from the current situation, where determination of what it means is useless, and possibly mimicking some version of Perl that is hard-coded.

thaljef commented 10 years ago

For what it is worth, I'll tell you what Perl::Critic does...

P::C has a few policies that behave differently based on which version of perl it thinks you are targeting. If the code says use vX.XX.XX anywhere in the file, then we apply whatever logic is appropriate for version X.XX.XX. If there is no such declaration, we assume you are targeting all versions of perl. Usually this means the policy becomes a no-op because it is not possible to comply in some older perl (for example, lexical iterators didn't exist until 5.004).

I admit our strategy is on the conservative side. Given the nature of P::C, we usually lean towards making everything explicit. But this isn't always the most user-friendly approach. For general PPI users, it might be better to assume they are always targeting the latest version of perl unless they declare otherwise (however, it would be preferable to us if P::C and PPI both followed the same strategy).

But PPI should never die just because it can't decide what to do. It is not a compiler. PPI is designed to be lenient with syntacticly broken, incomplete, and unrecognizable input. This is what makes PPI useful for things like Padre. So at most, it could warn about these situations.

Overall, I agree that PPI should try to parse in a version-appropriate way, if it can. Smart match isn't the only example of this. I think the package VERSION BLOCK syntax may be another case that PPI doesn't fully recognize.

Lexical effects are probably a big bag of pain. I know Adam has been concerned about introducing "dynamic" bits of grammar to the language precisely because they make tools like PPI and Perl::Critic impossible to build. But I'm not smart enough to have any insights on all that.

kentfredric commented 10 years ago

In addition to warning, instead of guessing what a series of tokens means, you might be fortunate enough to be able to keep looking past the bad token(s) and make the syntax parse again.

If you can do that, then you could warn, and emit an "indeterminate" token, which may help an interactive IDE like Padre show syntax errors via highlights on-demand.

Even if your look-ahead-and-continue is wrong, its a best-effort parse, and it may lead to other this-stuff-is-wrong things, but you'd have adequate information to give the user to clarify the problem so that it may be resolved.

thaljef commented 10 years ago

In addition to warning, instead of guessing what a series of tokens means, you might be fortunate enough to be able to keep looking past the bad token(s) and make the syntax parse again.

And that's exactly what PPI does (I think).

If you can do that, then you could warn, and emit an "indeterminate" token, which may help an interactive IDE like Padre show syntax errors via highlights on-demand.

There is PPI::Statement::Unknown, PPI::Structure::Unknown, and PPI::Token::Unknown.

wchristian commented 10 years ago

Will need to reread everything carefully and think about possible options. However one came to mind which i'd like to hear your opinions on.

How about having ambiguous parses triggering a method that by default emits Unknown elements of appropiate type, but can be configured to run user-provided code instead? Then users could implement either warnings, exceptions or reparses on their own.

thaljef commented 10 years ago

That would be possible, but it seems far afield from where this started. I thought the (initial) problem was getting PPI to understand that some constructs just don't exist in certain versions of Perl.

I guess it would help me if I understood the context of your problem a bit more. I'm not the active maintainer for PPI, but I'm happy contribute whatever thoughts I can.

thaljef commented 10 years ago

You could always search the DOM for the *::Unknown classes and handle them however you like.

Unless you really, really need to handle them as soon as they are parsed & lexed.

But again, it seems the first step is to get them classified as Unknown under the right circumstances.

wchristian commented 10 years ago

Mostly asked you to opine because you're maintaining the most known user of PPI. :)

That said, i'm looking for a solution to the entire problem scope, including the little issue of "having to check manually whether something weird happened means a lot of fragile code will be created".

Perl-Critic / PPI

How to deal with ambiguous parses? #28