Ericsson / CodeCompass

CodeCompass is a software comprehension tool for large scale software written in C/C++ and Java
https://codecompass.net
GNU General Public License v3.0
496 stars 96 forks source link

Lack of Cohesion of Methods at Type Level for C++ #681

Closed mcserep closed 4 months ago

mcserep commented 7 months ago

The single responsibility principle states that a class should have only one reason to change and one responsibility. Such a class is said to be cohesive.
A class has strong cohesion if its individual operations use as many of the class's fields as possible. Otherwise, the class is likely to perform multiple roles.

Several metrics exist for calculating a Lack of Cohesion of Methods value (hereinafter LCOM). The algorithms used by CppDepend to compute LCOM metrics and LCOM HS (Henderson-Sellers variant of LCOM) are the following:
LCOM = 1 – (sum(MF)/(M*F)) LCOM HS = (M – sum(MF)/F)/(M-1)

Where:

The LCOM takes its values in the range [0-1]. The LCOM HS takes its values in the range [0-2].
A class is utterly cohesive if all its methods use all its instance fields, which means that sum(MF)=M*F and then LCOM = 0 and LCOMHS = 0.

Recommendations:
Consider only types with more than 10 methods and 10 fields. In such cases an LCOM value larger than 0.8 or an LCOMHS value larger than 1.0 might be problematic. The constraint related to LCOMHS is stronger (and thus easier to satisfy) than the constraint related to LCOM.

dbukki commented 7 months ago

In LCOM = 1 – (sum(MF)/M*F), the M*F part should actually be in parentheses, right? Because if I interpret this formula as-is, then for the edge-case of sum(MF)=M*F we get LCOM = 1 – (sum(MF)/M*F) = 1 – (M*F/M*F) = 1 – F*F, which would only be 0 for records with exactly 1 field...

Also, according to this article, the second formula (for the Henderson-Sellers variant) is also wrong, there should be a division between (M – sum(MF)/F) and (M-1), not a multiplication. But then it's also interesting what happens at M=1; is the value simply undefined?

Finally, if these metrics produce fractional numbers, how should we store them in our metrics table where the value column of CppAstNodeMetrics currently expects unsigned int? Should we consider changing the type of the column to double or should we instead premultiply the value by some constant and then discard the remaining fractional part?

mcserep commented 7 months ago

@dbukki I have updated the formula above.

mcserep commented 7 months ago

Finally, if these metrics produce fractional numbers, how should we store them in our metrics table where the value column of CppAstNodeMetrics currently expects unsigned int? Should we consider changing the type of the column to double or should we instead premultiply the value by some constant and then discard the remaining fractional part?

@dbukki Use scaling for now.

UPDATE: as discussed at https://github.com/Ericsson/CodeCompass/pull/688#discussion_r1453256267 , refactor it to a nullable float field instead.