Ericsson / codechecker

CodeChecker is an analyzer tooling, defect database and viewer extension for the Clang Static Analyzer and Clang Tidy
https://codechecker.readthedocs.io
Apache License 2.0
2.19k stars 368 forks source link

[upstream] Using CTU mode in clang-tidy? #2507

Closed LebedevRI closed 3 years ago

LebedevRI commented 4 years ago

Right now, it appears that clang-tidy is CTU-unaware. It is true that unlike clang static analyzer which is it's entirety benefits from CTU, most clang-tidy checks are very local to the code, and won't benefit from CTU.

But that isn't true for all the checks. There are at least 2 existing ones that would benefit:

And i suspect there may be more if/once it's possible

I'm wondering if there are any thoughts about it, or a previous disscussion i missed. Thanks.

martong commented 4 years ago

Hi, this is a good idea and a great initiative, thanks!

The implementation does not seem immediate, however. In case of CSA we lazily import the AST of a needed function and its dependencies from the other TU. In case of tidy I think we should provide a fully merged AST for the tidy checkers, i.e. we should merge all top level declarations from all TUs. I don't see any points where we could do a lazy import (tidy devs could confirm or refute this).

This could be prototyped based on the -ast-merge feature of Clang in a few man weeks I reckon. Measurement and evaluation could take another few weeks. Unfortunately, we already have so many things on our plate for this year (like summary based analysis, improving C++ static analysis, etc). So, I don't think we will have the resources for this feature. But, contributions are very welcome! We would have time for reviews and suggestions.

JonasToth commented 4 years ago

That would kinda be a unity-build for each TU of the project? I think that is a problem performance-wise. In the end we want clang-tidy to be fast. But I think @martong idea is correct if the CTU should be available for every check.

But if that would be an opt-in for checks that benefit, the manual importing could be done in the checking code. Maybe that would require some matching to happen in check, but if that remains a special case it should be acceptable. At that point the lazy-import should be similar to how CSA does it, shouldn't it?

martong commented 4 years ago

Yes, first we should identify which checkers could benefit from the CTU mode, and users should run the tidy with CTU enabled only for those checkers.

That would kinda be a unity-build for each TU of the project? I think that is a problem performance-wise.

To mitigate the performance bottleneck, perhaps we could do a two-phase analysis. In the first phase we could discover for each TU if there is any missing definition in other TUs. We should check call-expressions, accesses of global variables and perhaps forward declarations of tag types. If any of them refer to a definition that is not available in the current TU but in another TU then we should mark them for import.

Another (probably unwanted?) solution could be to modify each CTU-profitable tidy checkers to be able to lazily import definitions from other TUs. So far, it seems this would affect only max 5 checkers.

JonasToth commented 4 years ago

Yes, we should analyze the gain first. I think the mentioned exception analysis profits. That is implemented in a separate utility-class anyway, that means we can hide higher complexitiy easily and it would be used by multiple checks.

I currently work on a check to mark variables const if not modified, that could potentially analyze called functions that take arguments by non-const reference for actual modification - that would benefit as well. This is again in a separate analysis class, that lives in clang/Analysis/ExprMutAnalyzer.

A big gain could be for global refactoring, e.g. interface changes. Those are not possible right now, because each TU is analyzed one by one. That would at least happen more efficient and one can keep state between TUs, that is not possible right now. That prohibits analyzing class hierarchies, that are usually split in multiple TUs. That would include analysis of virtual methods, to e.g. add noexcept or so.

At the end I think the CTU analysis is probably not that beneficial for current checks. But this is due to the current limitation not allowing going in this direction at all, so noone implements useful stuff that would benefit.

For current functionality: I expect clang-tidy to be run over all of the project. If for example noexcept is automatically added if it is locally possible, the analysis would at some point converge to the same result as being able to analyze everything on one run.

It has the nice effect of helping programmers, too. And shell-scripting a loop around run-clang-tidy.py or so is way easier then CTU support :)

For new functionality: CTU would definitely open new possibilites that are worth exploring.

Xazax-hun commented 4 years ago

I think, for the kind of analyses clang tidy does, we better of with having some custom summary of each TU and rely on that.

For example, in case of the exception analysis the summary of a TU could contain a map from definitions to a list of exception types that can escape the function. Using this information and a global call graph might be enough to do the analysis globally without loading all the ASTs into the memory at once.

Xazax-hun commented 4 years ago

A big gain could be for global refactoring, e.g. interface changes. Those are not possible right now, because each TU is analyzed one by one.

Could you elaborate on this one? I think interface migrations are pretty much possible. The key is to do it in two phases. In the first phase you never rewrite any of the TUs, that would cause problems because of the header files. So basically, instead of rewriting any files, you export all the fixes. And in a second phase you can do the rewriting (which is applying textual replacements) without having to do any parsing.

JonasToth commented 4 years ago

A big gain could be for global refactoring, e.g. interface changes. Those are not possible right now, because each TU is analyzed one by one.

Could you elaborate on this one? I think interface migrations are pretty much possible. The key is to do it in two phases. In the first phase you never rewrite any of the TUs, that would cause problems because of the header files. So basically, instead of rewriting any files, you export all the fixes. And in a second phase you can do the rewriting (which is applying textual replacements) without having to do any parsing.

Yes, this is how run-clang-tidy currently does it. But from the analysis point of view you still see only on TU. If e.g. you have BaseClass.cpp, SubClass.cpp, YetAnotherSubClass.cpp you are always bound to the definitions you get from the header files. But if for example some virtual method misses some information, e.g. if it could be const, noexcept, contracts or anything else, you can not find out if all - known - overriders do something, or don't.

E.g. this method could be const if the method itself or none of its overriders modify the object. Right now you need some persistent storage between TUs, which is not implemented. Something like this should probably be done after modules are introduced to the codebase, because then you can separate public and private code, but I hope the idea is clear. Something like this could help modernizing legacy code-bases.