Closed dbukki closed 2 months ago
For reference, here is a list of McCabe and bumpy road metrics exported from the Xerces-C project using the current state: xerces-bumpyroad.log xerces-mccabe.log
I found an interesting phenomenon in the parser:
In C++, we know that records and functions can be either just a declaration or an actual definition. However, our policy towards storing these in the parsed database is very different:
For record types, VisitRecordDecl returns early if the visited record declaration is not a definition. This causes the corresponding CppRecord entity to remain initialized with astNodeId==0, which eventually causes the record traversal functions to omit it from _types
(and therefore the database):
if (_type->astNodeId)
_visitor->_types.push_back(_type);
Note: In this PR, this call happens in TypeScope::~TypeScope
, but the behavior itself is not new to this PR. In the current master, the same behavior can be found in the TraverseRecordDecl (and similar functions), so this is not a regression.
For functions, VisitFunctionDecl will initialize the CppFunction entity regardless of whether the function is a declaration or definition, which causes every occurrence of it to be added to _functions
(and therefore the database):
if (_curFun->astNodeId)
_visitor->_functions.push_back(_curFun);
Note: In this PR, this call happens in FunctionScope::~FunctionScope
, but the behavior itself is not new to this PR. In the current master, the same behavior can be found in the TraverseFunctionDecl (and similar functions), so this is not a regression.
Essentially, this means that:
class A;
class A {};
class A;
class NoDef0;
class NoDef1;
class NoDef2;
void f();
void f() {}
void f();
Note: The 3 different CppFunction entities are nearly identical in content, with the exception of fields computed from the function's body (e.g. mccabe and bumpiness metrics). The metrics fields of the declaration entities will retain their default values assigned in VisitFunctionDecl; only the definition (which has the body) will contain the actual metrics.
As far as I can see:
The "multiple function entities" problem is also the root cause of why the McCabe and BumpyRoad tests fail for functions that are declared more than once: For such functions, _db->query_value<model::CppFunction>(...)
runs into an assert as the query yields more than one results for the given name, out of which only one is truly meaningful.
I tried resolving this ambiguity by applying the same logic to functions as what we already utilize with records. In my local branch, I added a guard condition to only store the CppFunction entity for the definition.
However, this actually causes an existing test (namely: CppParserTest::FunctionDeclarationOnly
) to fail. So this got me confused, and I now see two possibilities:
FunctionDeclarationOnly
test would have to change, but metrics (and relevant function info) would only be stored for every function once.The second option seems like the more rational alternative to me, both from a database design perspective, and from a metrics-query perspective. It would also cut back on the size of the CppFunction table, which apparently contains a lot of "duplicates" right now. But I don't know if this problem deserves an issue ticket of its own or not. I am also uncertain about the regressions this second option would introduce, so I definitely don't want to rush ahead with the development until the situation is clear.
@mcserep Could you advise me on which of the above two options is the correct approach?
@dbukki As we discussed yesterday on the weekly meeting, please continue as follows:
CppFunction
record; orCppFunction
issue in a separate PR, see #720.If I execute incremental parsing on a project, parsing fails immediately with a segmentation fault. @dbukki can you please check if this is coming from your modifications?
If I execute incremental parsing on a project, parsing fails immediately with a segmentation fault. @dbukki can you please check if this is coming from your modifications?
I checked the current master (8e84d84e29a0cec6cb0af9f6dcc587ea9ff34480) and the segfault is also present there. I put some logging into various places, including the constructor of ClangASTVisitor, and no logs are printed before the segfault happens. This leads me to believe that the crash happens at a much earlier step, long before any actual C++ parsing can take place.
In any case, this is a separate problem. I have opened an issue for it: https://github.com/Ericsson/CodeCompass/issues/735
I checked the current master (8e84d84) and the segfault is also present there. I put some logging into various places, including the constructor of ClangASTVisitor, and no logs are printed before the segfault happens. This leads me to believe that the crash happens at a much earlier step, long before any actual C++ parsing can take place.
In any case, this is a separate problem. I have opened an issue for it: #735
@dbukki I have checked and branch release/gershwin
is not affected by this bug, so it is highly likely to be related to the cppmetrics
plugin, as that is the only new plugin in the upcoming release with a parser component. Nevertheless, it should be investigated in a separate issue (#735), as it is not directly related to the bumpy road metric.
@dbukki The bumpy road metrics was not added to the service.
Thrift interface: https://github.com/Ericsson/CodeCompass/blob/7ba19f9834fddd3473bcc9e77a3a6e96bf6759c2/plugins/cpp_metrics/service/cxxmetrics.thrift#L7-L13
Service implementation:
https://github.com/Ericsson/CodeCompass/blob/7ba19f9834fddd3473bcc9e77a3a6e96bf6759c2/plugins/cpp_metrics/service/src/cppmetricsservice.cpp#L26-L46
Please add this to the codebase in a new fixing PR, so the bumpy road metric become queryable through the web API.
Fixes #684
Formula
The bumpy road metric of a function is computed as the function's total bumpiness divided by the number of statements considered. The total bumpiness of a function is the sum of the depth of each considered statement, where depth is the level of the statement's indentation. (How many parent scopes does it have?)
bumpy_road(F) = ( SUM {s in S} (depth(s)) ) / count(S)
where S is the set of considered statements in function F.
Note: Functions with
count(S)=0
are considered empty. In this case the result of the formula is1
.Domain
The bumpy road metric only considers
if
,for
, ...),{ ... }
, function body), andint a;
,x += y + 8;
, ...) (basically only the "proper" statements terminated by a semicolon)when traversing the AST. Anything else (e.g. labels, sub-expressions, ...) is not counted towards this metric.
Range
For every function
F
, letmax(D)
be the depth of its deepest statement. The bumpy road metric ofF
falls within the range:[1, max(D)]
where
1
means completely flat (=good), andmax(D)
means completely nested/bumpy (=bad).Changes
In order to integrate this metric into CodeCompass, the total bumpiness and statement count had to be computed during parsing (since we do not store the necessary statement info in the database that the metrics plugin could utilize).
For these values, the
CppFunction
table now has two extra fields:bumpiness
andstatementCount
. The metrics parser is then responsible for computing the quotient that makes up the final metric.In the parser, most of our existing
Traverse*
functions had to be restructured. As this also impacted existing features (notably: McCabe and destructor usage via statement stacks), some refactoring also had to be done to ensure they still work like before.Instead of the old template decorator approach, we now use scope objects (
StatementScope
,TypeScope
,EnumScope
,FunctionScope
,CtxStmtScope
andScopedValue
) to perform pre- and post- actions aroundBase::Traverse*
calls with their ctors and dtors. Combined with some further generalizations at the rootTraverseDecl
andTraverseStmt
functions, this not only reduces the number of specializedTraverse*
functions (which usually contained duplicate bodies), but also shortens/condenses the code of the parser.In order to be able to track nested statements, scopes, and therefore the depth of the current statement during traversal, a new
_stmtStack
member has been introduced to the parser. This member is of a specialStatementStack
type that is built up fromStatementScope
objects (one per each statement during traversal) to form the "parent chain" of the currently inspected statement. Individual statement scopes can then be further configured by the specializedTraverse*
functions to describe how that particular statement affects the depth of further statements on the stack. (Note: I did my best to add documentation/comments to the non-trivial parts of each scope type.) With this new mechanism, the old_mcCabeStack
and_statements
stacks could be successfully folded into this logic, thus further reducing complexity (and unnecessary duplication of the same statement stack pattern).Testing
Unit tests for bumpy road have been added into to the test directory. Test cases have been written in a similar style as with the McCabe metric. Further McCabe test cases have also been added to check previously untested cases that I have discovered during my attempts at manual regression testing.