aspect-build / rules_lint

Run static analysis tools with Bazel
Apache License 2.0
91 stars 50 forks source link

[FR]: Support running formatters in parallel #355

Open calebzulawski opened 4 months ago

calebzulawski commented 4 months ago

What is the current behavior?

Files are passed via xargs to the formatter tool and run serially

Describe the feature

I have multiple C++ projects with large numbers of files (~1000) that take a very long time to format serially. We can use xargs -P to invoke multiple instances of the formatter. I'm not sure how the maximum number of processes should be determined.

alexeagle commented 4 months ago

It's unusual to want to format all ~1000 files - perhaps you should call the formatter from a tool like pre-commit that knows which files have been changed? You probably saw that we use rules_multirun which already runs different formatter tools in parallel. So I imagine there's a way to extend this to run a single formatter tool over parallel shards of the files for that language. However I suspect the added complexity won't be worth maintaining, given that the intended "typical" usage is to only format changed files.

calebzulawski commented 4 months ago

Does format_test only check changed files since the last run of the test? If that's the case, then I agree that it's probably not a common use case. The situation I ran into was the test exceeding the 300 second timeout on the first run.

Regarding sharding, I think using xargs -n max_files -P processes (the format script already uses xargs) would handle that, but I'm not sure what a reasonable default would be.

alexeagle commented 4 months ago

format_test can't tell which files have changed. It would need some connection to version control, which is out of scope here. Also if you list the files to check, then it's hermetic and should be a cache hit, so vcs logic would make it non-hermetic (you could get a cache hit on a run that only checked a subset of files)

alexeagle commented 4 months ago

More thoughts:

SRCS = glob("**/*.c")

[
  format_test(
    name = "check_format_" + i,
    srcs = SRCS.subset(...)
  )
  for i in range(0, 10)
]