Skip trying to substitute macros into lines that do not contain them

albertziegenhagel commented 1 year ago

While investigating #295 I used cProfile on the code and noticed that most of the parsing time is spend in regex.subn in fortls/parse_fortran.py#L2241.

In that code part, we try to substitute all macros into all lines by calling subn on the compiled regex, which results in an unmodified line if the line did not contain the macro at all. This call to subn still seems to be quite expensive (runtime-wise).

To spare the cost, I added a check whether the line actually contains the macro text and skip the substitution check otherwise. Please note that a line containing the macro text does not necessarily mean that a substitution will take place, since the simple containment test is less strict than the regex (which checks that the macro is an actual token separated from the surrounding context, as in FOO being in FOO_BAR but it should not be substituted).

With these changes I was able to reduce the parsing time of the example described in #295 from ~9.8 seconds to ~1.4 seconds, which comes far closer to the time it takes without specifying any pre-processor definitions (~1 second).

In a real world example I could reduce the time to parse a single file from ~107 seconds to ~7seconds.

The parsing time of our whole code-base was reduced from more than 1.5 hours to less than 7 minutes.

NOTE: If I exchange the simple containment test via

if not def_tmp in line: continue

by checking the actual regex via

if not def_regex.match(line): continue

a few lines further down, this still reduces the time to parse the real world file from ~107 seconds to ~26 seconds.

codecov[bot] commented 1 year ago

Codecov Report

Merging #296 (325b821) into master (1f54125) will increase coverage by 0.00%. The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #296   +/-   ##
=======================================
  Coverage   86.95%   86.96%           
=======================================
  Files          12       12           
  Lines        4569     4571    +2     
=======================================
+ Hits         3973     3975    +2     
  Misses        596      596

Impacted Files	Coverage Δ
fortls/parse_fortran.py	`89.12% <100.00%> (+0.01%)`	:arrow_up:

gnikit commented 1 year ago

This is a good catch. The preprocessor is a complete mess IMO. Some of the future plans were/are to completely remove the preprocessor and instead replace it with an off the shelf C/C++ preprocessor. Unfortunately, I haven't found a preprocessor that fits our needs

fortran-lang / fortls

Skip trying to substitute macros into lines that do not contain them #296

Codecov Report