Option to automatically skip/filter out files based on status of git

Ericsson / codechecker

CodeChecker is an analyzer tooling, defect database and viewer extension for static and dynamic analyzer tools.

https://codechecker.readthedocs.io

Apache License 2.0

2.28k stars 383 forks source link

Option to automatically skip/filter out files based on status of git #306

Open AlexTelon opened 8 years ago

AlexTelon commented 8 years ago

With the use of

git diff --name-only --diff-filter=[AM] HEAD

One can get a list of all files that have either been added (A) or modified (M) since the last commit. There are more flags if one would like to have another default behaviour.

But with the help of this its easy to generate a skipfile that ignores all but those file.

With this we could introduce a new feature lets say "--git-filter" that would automatically ignore all files not changed by the user since the last commit.

CodeChecker check -n "name" --git-filter -l log

This could cut down the analysis time by a large factor when dealing with small/medium commits in a large codebase.

Next level would be if it was possible to ignore all but the lines changed. I will do a similar thing for the setup for my thesis job, but figured I should post it here so we can have a discussion.

Xazax-hun commented 8 years ago

Unfortunately it is not sufficient to only check modified / added files. For example, in case a header is modified, all the dependent C++ files should be reanalyzed. To track what files should be rechecked or recompiled is the task of the make system. So in case you did modify a file, you can issue make command and check which files needs to be reanalyzed.

Unfortunately the workflow is the following:

Modiy the source files
Try to compile
Go to 1. if something failed (compilations or tests)
Run the analyzer. At this point we lost the information what should be recompiled since the last check.

Possible workarounds for this problem:

Track what files were recompiled during the last few make calls and analyze only those files.
Have a separate target in the make system for static analysis, and only analyze files that changed or depends on files that changed since list time this target was built (i.e. analysis was carried out).
Ask the compiler about the dependencies of the modified/added files.

AlexTelon commented 8 years ago

Thats true. I have been blined by the large amount of bugs Im getting here that does not need a larger context and forgot that there are other cases.

The workarounds sounds interesting and in case the running times are too long over here I will look into your suggestions.

Would there be any gain (in time) in doing two runs if the codebase is really large? One run for all checkers that are context independant or what you should call it. Meaning that if a .h file has changed these checkers only need to check that very file. And then run a separate run on all files with the other checkers?

In my mind it seems like most of the checkers end up in the first group. Things like security.insecureAPI.gets simply warn on local usages.

core.DivideByZero on the other hand would need a larger context.

But I might be wrong here?

gyorb commented 8 years ago

I think we are speaking here about two things:

how a build system detects which files should be rebuilt
how a clang checker analyzes the source code

The analysis with CodeChecker depends on the build system to know which files were modified and should be analyzed and with this information will be the analyzers called.

I recommend if it works for your workflow that during the development do not build/analyze the whole project, only smaller parts where a clean build/analysis does not take too much time. For the whole project which takes more time a separate build/analysis should be started.

If you have a compile_command.json file for the whole project you can modify/split it for the analysis. Until no new files are added it can work, if you add a new c/cpp file which should be built you have to regenerate the compile_command.json file.

I see your problem and you can try the mentioned workarounds by @Xazax-hun but I do not think that CodeChecker should handle/manage what currently the build systems do (which files are modified and should be rebuilt).

Xazax-hun commented 8 years ago

I think I have the ultimate solution!

What about a script that "touch" all of the files that were added or modified. After the modification date is changed the next make command will only rebuild the files that we need to analyze. This way we can achieve incremental analysis.

AlexTelon commented 8 years ago

@Xazax-hun That is indeed a great solution! And even though a file is added or modified its already touched its good with a script like this in case a user does several incremental analysis before commiting. With a script one could make sure that all files since last commit that has changed always are analyzed, not only the ones since the last build.

But even simply using this: CodeChecker -n "name" -b "make platform"

Shuold give you a poor mans incremental analysis.

Then the questions is at what point this is faster than using a logfile and analyzing everything.