cqfn / aibolit

Static Analyzer for Java Code with Machine Learning in Mind
https://pypi.org/project/aibolit/
51 stars 17 forks source link

Suggest new Quality Metrics of Java classes. #166

Open acheshkov opened 4 years ago

angusev commented 4 years ago

I propose to implement metrics of associated AST. I believe these metrics correspond well to human perception: ~ max depth of AST ~ number of leaves (number of all possible paths) ~ max node degree in AST scaled by NCSS

cringoleg commented 4 years ago

Must have: CWORD – a certain percentage of comments in source code is a good practice. CBO – numbers of mutual dependencies between classes. Should be relatively low. COH – overall coherence in the source code file. Should be relatively low. May be pretty good: DIT – Depth of Inheritance Tree, the deeper inheritance is the more complex code becomes. NCO – number of methods returning a NULL value.

lukyanoffpashok commented 4 years ago

NOS - Total number of Java Statements in class. Simple to calculate using javalang.

lukyanoffpashok commented 4 years ago

NSUB - number of subclasses of this class. Simple to calculate using AST representation.

lukyanoffpashok commented 4 years ago

HIER - number of methods called that are defined in the hierarchy of the class.

lukyanoffpashok commented 4 years ago

Based on AST metrics: 1) maximum length of the path between leaves in AST 2) minimum length of the path between leaves in AST

lukyanoffpashok commented 4 years ago

MPC - the number of external methods called by all the methods in the class

lukyanoffpashok commented 4 years ago

LMC - number of local methods calls(calls to methods that are defined in this class)

lyriccoder commented 4 years ago

MPC is has the most impact on readability, in my opinion. The more external methods you call, the less readable code is, since we switch from context into context. Also if we have a complicated domain (we have lots of classes in the package and they are all used in a function), it will also be taking into account. Also, I would add the number of used class fields to this metrics

acheshkov commented 4 years ago

A number of unique, non-primitive data types used in the code normalized by NCSS.

acheshkov commented 4 years ago

image https://arxiv.org/pdf/1909.09682.pdf

acheshkov commented 4 years ago

A number of unique, non-primitive data types used in the code normalized by NCSS.

looks similar to https://pmd.github.io/latest/pmd_java_metrics_index.html#class-fan-out-complexity-class_fan_out

acheshkov commented 4 years ago

more metrics here https://hal.inria.fr/hal-00646878/file/Duca11b-Cutter-deliverable11-SoftwareMetrics.pdf

acheshkov commented 4 years ago
  1. Access to Foreign Data (ATFD)
  2. Access to Local Data (ALD)
  3. Average Method Weight (AMW)
  4. Base Class Overriding Ratio (BOVR)
  5. Base Class Usage Ratio (BUR)
  6. Capsules Providing Foreign Data (CPFD)
  7. Class Weight (CW)
  8. Coupling Between Objects (CBO)
  9. Cyclomatic Number (CYCLO)
  10. Depth of Inheritance Tree (DIT)
  11. Dispersion Ratio (DR)
  12. Externally Called Global Functions (ECGF)
  13. FANIN (FANIN)
  14. FANOUT (FANOUT)
  15. Foreign Data Providers (FDP)
  16. Incoming Coupling Dispersion for an Operation (ICDO)
  17. Incoming Coupling Intensity for an Operation (ICIO)
  18. Instability Factor (IF)
  19. Lack of Cohesion of Methods (LCOM)
  20. Line Bias (LB)
  21. Lines of Code (LOC)
  22. Lines of Comments (LOCOMM)
  23. Locality of Data Accesses (LDA)
  24. Loose Capsule Cohesion (LCC)
  25. Maximum Nesting Level (MAXNESTING)
  26. Number of Abstract Classes (NOAC)
  27. Number of Abstract Methods (NOAM)
  28. Number of Accessed Variables (NOAV)
  29. Number of Accessor Methods (NOACCM)
  30. Number of Added Services (NAS)
  31. Number of Attributes (NOA)
  32. Number of Children (NOCHLD)
  33. Number of Classes (NOC)
  34. Number of Functions (NOF)
  35. Number of Global Functions (NOGF)
  36. Number of Global Variables (NOGV)
  37. Number of Incoming Calls (NOIC)
  38. Number of Local Variables (NOLV)
  39. Number of Methods (NOM)
  40. Number of Module Variables (NOMV)
  41. Number of Modules (NOMOD)
  42. Number of Outgoing Calls (NOOC)
  43. Number of Overriding Methods (NOVRM)
  44. Number of Parameters (NOPAR)
  45. Number of Protected Attributes (NOPRTA)
  46. Number of Protected Methods (NOPRTM)
  47. Number of Public Attributes (NOPUBA)
  48. Number of Public Methods (NOPUBM)
  49. Outgoing Coupling Dispersion for an Operation (OCDO)
  50. Outgoing Coupling Intensity for an Operation (OCIO)
  51. Outgoing Dependency on Delegators (ODD)
  52. Percentage of Newly Added Services (PNAS)
  53. Response For a Class (RFC)
  54. Size of Duplication Chain (SDC)
  55. Size of Exact Clone (SEC)
  56. Specialization Index (SPIDX)
  57. Tight Capsule Cohesion (TCC)
  58. Weighted Operation Count (WOC)