Incremental storage - Githubissues

Ericsson / codechecker

CodeChecker is an analyzer tooling, defect database and viewer extension for static and dynamic analyzer tools.

Apache License 2.0

2.27k stars 381 forks source link

The problem is the following The user would like to analyze his project always into a new report directory incrementally. We assume that the user is not able to keep and store a full report directory that came from a previous full analysis as we recommend in "Incremental analysis" section of https://github.com/Ericsson/codechecker/blob/master/docs/usage.md#incremental-analysis

Use case I have a simple project, I analyze it with CodeChecker and I store the results to a running CodeChecker server. Lets suppose that it contains 10 bugs (the detection status will be NEW).

Now I modify a source file (lib.cpp) in which there was 3 bugs and I reanalyze it into an empty report directory. I solved 2 bugs and I introduced 1 new bug. So there are two bugs in this report directory.

New behaviour The CodeChecker cmd diff command should work in the following way:

CodeChecker cmd diff -b remote_run -n report_dir --new: there would be 1 new bug.
CodeChecker cmd diff -b remote_run -n report_dir --resolved: there would be 2 bugs (2 which I solved in lib.cpp file).
CodeChecker cmd diff -b remote_run -n report_dir --unresolved: 8 bugs (10 bugs initial bugs - 2 bugs what I solved).

The CodeChecker store should work in the following way If a bug is in NEW detection status on the server but I did not modify the source file which is related to this bug we will suppose that the bug is still in the project so we will change the detection status from NEW to UNRESOLVED. So we will get the following results on the above example in case of storage: 1 NEW, 2 RESOLVED, 8 UNRESOLVED. lesz.

Storage of partial results Probably should be invoked with a separate switch: CodeChecker store -u ./partial_reports See the notes at the end.

I think the results should be connected to source files we analyze. When the analysis is re-run in update mode, we should know which source files were re-analyzed. This should be sent up to the server. The source file name is included in the plist file too e.g. lib.cpp_d1ea32fe3064c529a059ea72637cd445.plist.

We should assume that all relevant analysis commands were re-run that is needed for the re-analysis of the specific file.

So following the example, if lib.cpp is re-analized and the results are stored in the server, the 2 bugs should be set to fixed. The new bug should be set to new and all other reports that were in new status should be set to unresolved.

How to handle reports in headers

For each report, we should store the list of source files that brought in that report. In case of headers, this will be typically multiple source files. For reports in source files, it will typically a single source file. Note: The same report in a header that originates from multiple source files, only a single report is stored.
We can set the detection status of a report to fixed only if all source files were re-analyzed from where the error was originated. For a report originating from a source file, typically only the single source file needs to be re-analized and for headers, all source files needs to be re-analyzed that reported the error.

This way, the storage of "partial report directory" will work in the same way as if one updated the report directory after an incremental analysis.

Baseline behaviour Without -u we do the baseline behaviour to be able to clean-up if there were many file name changes.

It may be that file names change. And in this case the reports will be stuck in unreviewed status. Alternatively, we should always detect if all files present that we have analysis results stored for. In that case we would not need the -u switch.

Ericsson / codechecker

Incremental storage #2658