Perl-Critic / PPI

54 stars 44 forks source link

parsing slowdown with new features in 1.281 #309

Open mtths opened 1 month ago

mtths commented 1 month ago

we saw quite a slowdown in parsing times with the latest PPI, especially in our fatter modules (up to a factor of 20)

the slowdown seems to occur in parsing the document during object creation. method calls on the parsed document have similar call times as before

  $ perl -MPPI -E 'warn PPI->VERSION'
  1.281 at -e line 1.
  $ /tmp/ppi_test.pl "/opt/perl-current-live/local-default/lib/perl5/CGI.pm"
  loaded in 4.34172 at /tmp/ppi_test.pl line 10.
  complete 1 in 0.05063 at /tmp/ppi_test.pl line 14.
  incl 19 in 0.04981 at /tmp/ppi_test.pl line 18.
  $
  $ ## after PPI downgrade
  $ perl -MPPI -E 'warn PPI->VERSION'
  1.279 at -e line 1.
  $ /tmp/ppi_test.pl "/opt/perl-current-live/local-default/lib/perl5/CGI.pm"
  loaded in 0.82833 at /tmp/ppi_test.pl line 10.
  complete 1 in 0.04924 at /tmp/ppi_test.pl line 14.
  incl 19 in 0.05106 at /tmp/ppi_test.pl line 18.
  $
  $ cat /tmp/ppi_test.pl
  #!/usr/bin/env perl

  use v5.40;
  use PPI;
  use Time::HiRes qw/time/;

  my $t0 = time;
  my $p = PPI::Document->new(shift)
      or die "couldnt load";
  warn sprintf("loaded in %.5f", time - $t0);

  $t0 = time;
  my $ok = $p->complete;
  warn sprintf("complete %s in %.5f", $ok, time - $t0);

  $t0 = time;
  my $incl = $p->find("PPI::Statement::Include");
  warn sprintf("incl %s in %.5f", scalar @$incl, time - $t0);

  $

we use PPI for checking syntax and via perlcritic

jixam commented 1 month ago

We are affected by this as well. According to NYTProf, most of the time is spent here: https://github.com/Perl-Critic/PPI/blob/d9554383fc9f28adccc5dfd4284da53174cb14d7/lib/PPI/Token/Whitespace.pm#L215 (and then I got lost in recursion)

Eckankar commented 3 weeks ago

We are affected by this as well. According to NYTProf, most of the time is spent here:

PPI/lib/PPI/Token/Whitespace.pm

Line 215 in d955438 my ( $has_sig, @tokens ) = $t->_current_token_has_signatures_active; (and then I got lost in recursion)

I've tried doing a NYTProf flame chart on a file for 700 seconds, and it looks like the big problem is PPI::Element::presumed_features.

Whenever it is called; it does a full traversal back up every previous sibling, and every previous sibling of every parent, from the point of spotting a potential function signature.

The traversal in itself takes ~90s of the 700s; and 500s is then spent of calling feature_mods.

It seems like a lot of repeated work is being done here.

The function call shown by @jixam is as far as I can tell being called every time the tokenizer encounters a ( - traversing through the entire syntax tree each time. No wonder it's super slow.

I would imagine this could be sped up immensely by caching the results. Perhaps each node in the syntax tree could keep track of the current feature status of that statement; so it wouldn't have to traverse through the entire tree, but just ask the previous node, and update that based on the current statement.

oalders commented 3 weeks ago

If someone would like to provide a pull request to fix this, I can look at getting it released.

wchristian commented 3 weeks ago

Fwiw, i have talked with @mtths about this and currently this is lower on my priority list than other todos with PPI.

That said: Yes, caching is probably the solution. However in that case we also need to consider how to invalidate the cache when the structure gets edited. Possibly the cache could be temporary while a fragment is being parsed.

haarg commented 3 weeks ago

This makes PPI pretty much unusable for a lot of cases. At my work we've downgraded to the previous version.

wchristian commented 3 weeks ago

I'm baffled as to what you could be doing that would render it unusable, but alright. I'll try and see if i can get a solution of caching and end-of-parse invalidation done tomorrow. If anybody else wants to give it a shot and maybe be faster than me, be my guest. :)