github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.77k stars 1.56k forks source link

UX: interrupt codeql database analyze should write the already computed results to the output file #17930

Open disconnect3d opened 4 weeks ago

disconnect3d commented 4 weeks ago

When executing codeql database analyze <db> --format=sarif-latest --output=out.sarif -- <querysuite> on a large project it may happen that certain queries evaluation takes a very long time.

In such a case it would be very useful to be able to interrupt the analyze process (e.g., via CTRL+C) and save the output file with results from the queries that have already been evaluated.

This along with #17929 and #17928 would substantially improve the user experience of CodeQL.

PS: Sorry if those issues should belong to the CodeQL CLI binaries repo. I started creating them here and realized it later on. Let me know if I should move them there.

jketema commented 4 weeks ago

Hi @disconnect3d. Thanks for your suggestion, we'll take this into consideration.

hmakholm commented 4 weeks ago

This is not as straightforward as it would appear as first. The database analyze command works in two phases: First all of the queries are "evaluated" into tabular results (stored in an internal binary format as .bqrs files within the results subdirectory of the database), and then those tables are "interpreted" into alerts in SARIF format.

The reason for splitting into two phases is this: The evaluator really wants to evaluate all the queries together so it can share intermediate results between them. Could we temporarily pause it to do "interpretation" after each .bqrs has been produced? Not really. For alerts that show taint paths through the code, the interpretation phase is responsible for selecting some representative paths through a larger graph of dataflow that was produced during the evaluation phase. This is a potentially RAM-intensive computation, and if the evaluator is still active (even it it were to be paused), it is still using all of the configured --ram for its own intermediate results.

If you stop database analyze (or database run-queries) partway through the evaluation phase, you can still get alerts from the .bqrs files that have been completed by then, by using codeql database interpret-results manually afterwards. It will ignore the still-missing .bqrs files with a modicum of grace, save for printing whiny complaints to the console.