Open jarshwah opened 2 months ago
Somewhat answering my own question, fixit can generate debug logs for how long each execution of each plugin hook takes. But the format of those logs isn't very useful with a large number of files (and, tbh, figuring out how to configure the logging framework to output those logs wasn't something I tried to do).
LintRunner.collect_violations
provides a timings_hook
that can be used to collect timings but it isn't exposed in the main CLI. Installing fixit as an editable package locally, patching it, and running can provide a lot more detail:
diff --git a/src/fixit/engine.py b/src/fixit/engine.py
index 2e56dda..7a5f5ef 100644
--- a/src/fixit/engine.py
+++ b/src/fixit/engine.py
@@ -48,6 +48,20 @@ def diff_violation(path: Path, module: Module, violation: LintViolation) -> str:
)
+def timing_hook(timings: Timings) -> None:
+ """
+ Log timings for each rule.
+ """
+ split_timings = {}
+ for name, duration in timings.items():
+ rule, hook = name.split(".")
+ split_timings.setdefault(rule, {})[hook] = duration
+ for rule, durations in split_timings.items():
+ for hook, duration in durations.items():
+ print(f'"{rule}","{hook}",{duration}\n', end="")
+
+
+
class LintRunner:
def __init__(self, path: Path, source: FileContent) -> None:
self.path = path
@@ -67,6 +81,8 @@ class LintRunner:
lint rules have finished running, passing in a dictionary of
``RuleName.visit_function_name`` -> ``duration in microseconds``.
"""
+ if timings_hook is None:
+ timings_hook = timing_hook
@contextmanager
def visit_hook(name: str) -> Iterator[None]:
$ fixit lint src > ~/Documents/fixit-rule-timing.csv
Then analyse with pandas:
import pandas as pd
data = pd.read_csv("~/Documents/fixit-rule-timing.csv", names=["plugin", "hook", "duration"])
# Convert duration to seconds
data["duration"] = data["duration"].div(1000000).round(2)
# Agg by plugin+hook
agg = data.groupby(["plugin", "hook"])["duration"].sum()
top_N = agg.nlargest(20)
print(top_N)
There are a few common paths that I know are slow, but you might be hitting some others depending on what your rules are using. I appreciate #458 with the good repro, I'll give some general thoughts here and then try to look deeper on that one.
I hacked up a quick draft of tracing that works for single-process mode only so far. Here's a screenshot of a sample run (pip install keke; fixit --trace=... --stats fix src/fixit
) so we have something to discuss.
Most of the per-file time with this configuration is not spent in rules (tiny unnamed bars below visit_batched
), which was surprising to me. It's spent in libcst internals (the unattributed time below visit_batched
) and a fairly large time is also spent in GC (the yellow).
I suspect this is in metadata calculation (e.g. scope or qualified name) but that is probably more of a design inefficiency in libcst than something that's immediately fixable. Giving @zsol a heads-up that a keke PR for libcst might be on the way.
Good news (if you want to look at it that way) is that this is cpu-bound (the green at the top is actually a graph, pegged at 100%) and that there doesn't immediately appear to be any external bottleneck like I/O.
For a larger number of files, you probably also start to notice the trailrunner.walk
call time and perhaps gitignore filtering, which is serial before it starts running rules, or the time to diff or transfer the results from multiprocessing. For small numbers of files (>1,<10) you will probably also notice the time to create processes (it uses spawn, which has to redo imports).
For fixit fix
(vs fixit lint
) you will also only be running single-threaded. This could be improved for fixit fix -a
mode, but I'd want more tests before landing a trivial one-line PR to do this.
Happy to merge a keke PR for libcst, I was about to release a new version tomorrow but can wait a bit
Thanks for the response @thatch 👍 keke
is new to me, I should have just reached for a profiler, it would have quickened some of my digging and patching.
Good news (if you want to look at it that way) is that this is cpu-bound
Yes, I agree, and that aligns with most of my testing too.
For a larger number of files, you probably also start to notice the trailrunner.walk call time and perhaps gitignore filtering
Yah, but at 6 seconds for 40,000 files with gitignore filtering as a once off cost is not horrible, though I did notice that trailrunner will still traverse into the top level of ignored directories like __pycache__
- the gitignore filtering only appears to work when matching against files. I'm unsure yet if it'll keep descending into subdirectories (think node_modules) but I think it does - since we ended up adding an exclusion rule for our node_modules directories.
I've updated https://github.com/jarshwah/experiments-fixit today with some more analysis, which essentially agrees that the bulk of the work is occurring within fixit_bytes
(half the time, at least). There is still some medium hanging fruit outside of that though:
I have further trace info available if you install
keke-tracing/keke (branch tmp-multiprocessing)
thatch/libcst (branch tmp-multiprocessing)
thatch/fixit (branch tmp-multiprocessing)
I'm not sure any of these are ready to merge, but they provide a starting point to understand where the time comes from.
Responding to your theories:
Where I think the time is spent:
My current thoughts:
parse_module
is about 400ms in this view for one file_gen_batchable
-- they are a chain and can't actually be batched, about 75% of the per-file time)_get_visitor_methods
being recalculated for every file, about 70ms in this view, the magenta under cyan just to the right of the last _gen_batchable
Note I don't currently have gc time spans working for multiprocessing workers, and would guess it accounts for ~10% of the total time (and likely the "NoR" oddity near the right side is a gc pause). In addition, collect_rules
in each process is about the first 550ms (using spawn, this work needs to be reimported in each child, but only paid once).
Responding to your theories:
configs for every path -- only 4ms on my machine per file, not the slow part
4ms per file, for 40,000 files, is still 160 seconds.
rules for every path -- only 2ms on my machine per file, not the slow part
As above, this is per-file, so accounts for 80 seconds total.
chunk with multiprocessing -- I don't follow, likely not the slow part
As per https://github.com/jarshwah/experiments-fixit?tab=readme-ov-file#multiprocessing-pool chunking the tasks across multi-processing rather than serializing/deserializing per-task can be more performant in some cases, though I didn't see much of an improvement here, likely because so much time is spent elsewhere. Might be worth re-examining once other avenues are sorted.
While I agree these are not the primary bottlenecks and investigation is best spent on the functions you've identified, the first two are still doing lots more work than they need to, and could be tackled separately (which I'm happy to help with where time permits).
4ms per file, for 40,000 files, is still 160 seconds.
User time, sure. But not wall time -- this should parallelize well. If you have ways to improve this, go for it.
Sorry coming back to this:
- parse_module is about 400ms in this view for one file
I can't replicate this, even with a much larger module or with the pure python parser.
import os
import pathlib
from libcst import parse_module
os.environ["LIBCST_PARSER_TYPE"] = "native" # no-op, but it isn't "pure"
path = pathlib.Path("libcst/_parser/entrypoints.py")
def open_and_parse(path):
content = path.read_bytes()
module = parse_module(content)
return module
%timeit mod = open_and_parse(path)
3.68 ms ± 42.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
os.environ["LIBCST_PARSER_TYPE"] = "pure" # pure python parser
%timeit mod = open_and_parse(path)
10.6 ms ± 353 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
I suspect that 400ms parse time is for all 40,000 files? For a relatively simple but non-empty module, I see approximately 3-4ms per-file with the rust-based parser.
As per the title, we're seeing incredibly long run times when running fixit==2.1.0.
Our code base is approx 4M lines of python code. On a Circle CI Large instance (4 CPU / 8gb), it takes up to an hour to run across the entire code base. We've worked around this by only running on changed files, which takes approximately 1 minute, except when we edit the fixit config or any of our custom rules where we'll run the entire suite.
I'm keen to try and understand why the runs may be slow by profiling in some way. Is there any tooling or logging available within fixit itself that might help to track down where some of the problems may be? To be clear, I'm coming at this from the assumption that the way we've written our rules is mostly to blame, and would like to identify the worst offenders.
While I'm not really able to share our implementations, I can provide some details of our current usage (if that might prompt some thoughts of "oh, of course that would be slow!":
We make heavy use of QualifiedNameProvider and FilePathProvider. We have lots of helper functions that presumably construct lots of instances, like so:
Still, I'm more interested in being able to analyse performance and profiling data myself rather than guessing, so anything in this space would be incredibly helpful.