Perl-Critic / PPI

53 stars 44 forks source link

The Keyword Question #273

Open wchristian opened 2 years ago

wchristian commented 2 years ago

Edit: I have implemented a proof-of-concept, please review and comment: https://github.com/Perl-Critic/PPI/pull/280




This spawned from discussion in #194 and https://github.com/Perl-Critic/Perl-Critic/issues/591

There are a lot of hard problems with Perl parsing that lead people to think there are simple answers, which leave out real problems. Here is a demonstration of not even how PPI parses things, but of how just by looking at code, it is impossible to predict what the Perl Parser itself will do.

strect.pm

package strect;
use strictures; use Import::Into; use Syntax::Keyword::Try (); use Try::Tiny (); 1;

sub import {
    strictures->import::into(1);
    ( @ARGV ? "Try::Tiny" : "Syntax::Keyword::Try" )->import::into(1);
}

legacy.pl

use lib '.';
use strect;

try { die "marp" } catch { print "caught $_" };

print 4;
$ perl legacy.pl 1
caught marp at legacy.pl line 4.
4
$ perl legacy.pl
Use of uninitialized value $_ in concatenation (.) or string at legacy.pl line 4.

modern.pl

use lib '.';
use strect;

try { die "meep" } catch { print "caught $@" }

print 5;
$ perl modern.pl 1
try() encountered an unexpected argument (1) - perhaps a missing semi-colon before or at modern.pl line 6.
5
$ perl modern.pl
caught meep at modern.pl line 5.
5

This demonstrates how in a plausible use case, the parsing of keywords can be affected even on the same interpreter, by something entirely outside the code itself, like command line arguments.

This is problematic because, depending on the imported keywords, this block:

try { die "meep" } catch { print "caught $@" }
print 5;

is seen by the Perl Parser as either 1 statement (with try consuming the return value of the print), or 2 statements (try, followed by print).

NOTE ALSO: Some people might like to mentally short-circuit here with "just assume try is the modern form". That would be useless. I used try here for brevity and reproducability. Other modules might import other keywords.


And to top this off, here is an example within a single file:

stroct.pm

package stroct;
use strictures; use Import::Into; use Syntax::Keyword::Try (); use Try::Tiny (); 1;

sub import {
    strictures->import::into(1);
    ( $_[1] ? "Try::Tiny" : "Syntax::Keyword::Try" )->import::into(1);
}

mixed.pl

use lib '.';

{
    use stroct "a";
    try { die "marp" } catch { print "caught $_" };
    print 4;
}

{
    use stroct;
    try { die "meep" } catch { print "caught $@" }
    print 5;
}
$ perl mixed.pl
caught marp at mixed.pl line 5.
4caught meep at mixed.pl line 11.
5

This demonstrates how even within the same file, based on lexical imports with no sensible hints PPI could possibly predict, try {} catch {} print 5; might be one or two statements.

wchristian commented 2 years ago

One possible solution that i can think of here, would be for PPI to be import-aware, in regards to keywords, and allow calling code to configure the parser to presume that based on [ module name string, module use regex, module use callback ] matches, certain keywords would be present in the attached block. That way we could define default sets [ core, cpan popular ], but still allow people with custom import modules to configure for their environment.

Thoughts @adamkennedy @oalders @Grinnz @EvanCarroll ?

wchristian commented 2 years ago
<Mithaldu> keywords cannot be enabled by a require statement ... right?
<LeoNerd> Correct
<LeoNerd> But any code inside a BEGIN block could
<Mithaldu> .... fuck
<LeoNerd> BEGIN { require Syntax::Keyword::Try;  Syntax::Keyword::Try->import; }   for one thing.
<LeoNerd> BEGIN { require Syntax::Keyword::Try;  $^H{"Syntax::Keyword::Try/try"}++ }    for more subtle gut-wrenching
Grinnz commented 2 years ago

Unfortunately my only conclusion on this is that PPI on its own cannot solve this. Chasing imports is a losing battle and would be a hack at best, literally any use statement or BEGIN block can arbitrarily change the parser state. I haven't thought of an alternative.

oalders commented 2 years ago

I agree that there are problems which PPI can't solve on its own and, having some experience with chasing imports, that's not a road I'd want to go down.

Having said that, I think there's a use case that says

this is code which I own, and I'm confident that it doesn't do unorthodox or clever things and if it does, I'm ok with PPI not always getting it right.

I think in cases like these if PPI gets it right most of the time, that would probably be ok for most people. It's already not really getting it right in some of these cases, so I do think that having a switch to turn on some more permissive parsing would probably suit a lot of use cases.

If PPI could experimentally support a config file, like Perl::Critic and perltidy do, then it would be easier to switch this behaviour on and off on a per-project basis. Turn it off by default, but maybe allow people to shoot themselves in the foot and see what happens.

One of the places where I'd like to see improvements is in better Perl tooling, especially in my editor. Having PPI take some hints about what is going on in the code, could be a real improvement in some cases.

wchristian commented 2 years ago

I think solving for 90% of cases if it can be done in a useful manner without breaking the remaining 10% of cases should be a goal.

Personally my worry with per-file-config without any import tracking whatsoever is that it makes it very hard to configure how and when to change parse modes if people have something as basic as files with different perl feature imports in the same project, or different imports for different blocks in a file.

That's why i think chasing imports is more helpful than doing file-level parse mode declarations because it's easy to allow people to configure what imports do what, while also allowing the option of doing per-file configurations.

oalders commented 2 years ago

I think solving for 90% of cases if it can be done in a useful manner without breaking the remaining 10% of cases should be a goal.

šŸ‘šŸ»

Personally my worry with per-file-config without any import tracking whatsoever is that it makes it very hard to configure how and when to change parse modes if people have something as basic as files with different perl feature imports in the same project, or different imports for different blocks in a file.

To be clear, in my case, just a per-project config would cover most of my cases. At $work 99% of the code will contain some pragmas which apply the same imports everywhere. So, in this case it's not complicated and just saying "assume signatures everywhere in this directory or below" is fine.

That's why i think chasing imports is more helpful

Any thoughts on how you would want to go about this?

zmughal commented 2 years ago

I can give an example of how that would work with Function::Parameters (which can add user-defined keywords).

I believe a common pattern is to use Import::Into in a shared "setup" package (like the strect.pm example) which normally should only have the side-effects of configuration of the caller. If one is allowed to only load that package, you can have a set of handlers that read in the hints and stash to find what each one is doing. This is what I do here

https://github.com/zmughal-CPAN/p5-Dist-Zilla-PluginBundle-Author-ZMUGHAL/blob/86c8ac100a3679258bbe6f45a0ae4724a7fbf79c/lib/Dist/Zilla/PluginBundle/Author/ZMUGHAL/Babble/FunctionParameters.pm#L8-L28

which lets me know what keywords and parameters were used by Function::Parameters. If configured to be cacheable with the import-list as a key, this only needs to be done once.

Grinnz commented 2 years ago

The problem is that executing arbitrary code of any sort is not acceptable as a PPI implementation.

zmughal commented 2 years ago

That's true. If there is a plugin framework to support various keyword/keyword-like modules from CPAN, then at the very least they would need to have a way to configure which ones are loaded in a given file and with what settings. That should perhaps be a callback. PPI itself can not load the arbitrary code, but the user of PPI can decide if they want to return a static configuration or not. The parser would have to do multiple passes to get all the use/BEGIN callbacks before parsing the rest of the file.

wchristian commented 2 years ago

To be clear, in my case, just a per-project config would cover most of my cases. At $work 99% of the code will contain some pragmas which apply the same imports everywhere. So, in this case it's not complicated and just saying "assume signatures everywhere in this directory or below" is fine.

Yeah, i recognize the value, and maybe we should just do that to start off with and mark it highly experimental. My hesitation stems from having a large $work project with code that applies pragmas in like 40% of the files. :)

Also i think it's better if PPI fails to parse something as a more complex form of what it seems to be, rather than succeeding in parsing a complex structure that is actually a mistake by the code's author and only looks like a complex thing it isn't. But i may be wrong on this.

That's why i think chasing imports is more helpful

Any thoughts on how you would want to go about this?

As a general thought:

  1. implement an attribute for PPI statements that tracks activated features and is passed on during parsing to each subsequent statement or structure so they can modify it and pass the modified version to the next one
  2. allow parsing code to look at said attribute and e.g. parse try {} catch {} as a complete block without semicolon if the relevant feature is activated, or disable parsings such as indirect (if applicable)
  3. at this point, we can already configure features per file by modifying the value of the attribute at the start of a parse
  4. implement code at the end of parsing of an import statement or begin block that hands it off to a callback which is expected to return a list of features-to-be-enabled and a list of features-to-be-disabled, which then modifies the attribute before handing it to the next statement
  5. implement a default callback there that, as per @leonerd, knows about core features https://metacpan.org/dist/perl/source/lib/feature.pm#L33
  6. extend the callback to receive user-defined matchings, such as having use strect enable try/catch parsing
  7. extend the callback with a predefined list of popular cpan matchings that can be enabled by default

The specific contents of the callback can be left up in the air for now, but there's plenty options there, up to and including executing an arbitrary callback passed in by the user, making it a parse-hook, which if absolutely necessary, might also allow things such as what @zmughal suggested there i think.

While i agree with @Grinnz that PPI itself will not execute the code it parses, i think an exception could be made for allowing users to define hooks that might execute parsed code.

Hm.

wchristian commented 2 years ago

Edit: This also enables some interesting "parsing features" such as using PPI to determine which features are likely to be enabled or disabled, and maybe even based on that to ask PPI what the minimum Perl version for a block or a parsed document is.

EvanCarroll commented 2 years ago

in regards to keywords, and allow calling code to configure the parser to presume that based on [ module name string, module use regex, module use callback ] matches, certain keywords would be present in the attached block. That way we could define default sets [ core, cpan popular ], but still allow people with custom import modules to configure for their environment.

This is the solution that I would propose. This seems like a requirement. You're being more comprehensive by allowing other modules on CPAN to enable these options, but that's a step more than what I think is required for a minimal solution. You're given two options here,

I suggested Keywords over in https://github.com/Perl-Critic/PPI/issues/194 as just one method there but I don't see these at all as discrete questions.

Are all related to the same core problem. It's just a parsing problem. The parser should be viewed as a customizable thing which is subject to versioning and modifications with pragmas and not something we can make any sense of divorced of those pragmas.

Grinnz commented 2 years ago

in regards to keywords, and allow calling code to configure the parser to presume that based on [ module name string, module use regex, module use callback ] matches, certain keywords would be present in the attached block. That way we could define default sets [ core, cpan popular ], but still allow people with custom import modules to configure for their environment.

This is the solution that I would propose. This seems like a requirement. You're being more comprehensive by allowing other modules on CPAN to enable these options, but that's a step more than what I think is required for a minimal solution. You're given two options here,

To repeat, literally any use statement or BEGIN statement can enable these parser changes without mentioning them. It is in fact extremely common due to modules like Moose and Mojolicious.

wchristian commented 2 years ago
  1. implement an attribute for PPI statements that tracks activated features and is passed on during parsing to each subsequent statement or structure so they can modify it and pass the modified version to the next one

this might need some consideration and thought in regards to what happens when the user changes the tree šŸ˜‘

EvanCarroll commented 2 years ago

To repeat, literally any use statement or BEGIN statement can enable these parser changes without mentioning them. It is in fact extremely common due to modules like Moose and Mojolicious.

This isn't something I don't know; you're not telling me something new there. I addressed it explicitly. I just don't particularly care because,

Moreover,

It would seem preferable to me to support the methods that Perl supports to the best humanly possible. Including all pragmas that change parsing semantics, and to make modules which push an unsupported method of changing parsing semantics as second class citizens. Even if realistically we realistically can support them above it should never come at the expense of not supporting Perl functionality. You may never be able to support all second-class use cases (like CPAN), but if you can't support things documented and supported by Perl itself everything will suffer

Grinnz commented 2 years ago

It's used by thousands of CPAN modules and almost every Darkpan project over a certain size. "support" isn't particularly relevant.

wchristian commented 2 years ago

I must admit i'm not entirely following the disagreement between the two of you.

I think it's true that massive amounts of code use a lot of almost untrackable ways to enable features and keywords.

I also think it's true that we can both make the parser aware of features, and able to automatically track (and particularly so with configuration) enough uses to make it worthwhile?

wchristian commented 2 years ago

Implementation-wise, i remembered that we are indeed operating with a tree, so:

wchristian commented 2 years ago

To keep this documented:

18:54 we basically only need two things: 18:54 1. parse and decide whether an import can activate features 18:54 2. a method on an element to ask whether a feature is active for it, by walking the tree 18:54 and while parsing things that might behave differently if feature active, use the previous two 18:54 and for 1. we can have configurations like perltidy and perlcritic have at different levels to accomodate customizations and shit

wchristian commented 2 years ago

I have implemented a proof-of-concept, please review and comment: https://github.com/Perl-Critic/PPI/pull/280