fastruby / skunk

A SkunkScore Calculator for Ruby Code -- Find the most complicated code without test coverage!
https://www.fastruby.io/blog/code-quality/intruducing-skunk-stink-score-calculator.html
MIT License
505 stars 25 forks source link

RFC: Formula needs attention (churn * cost * penalty produces unexpected results) #41

Open etagwerker opened 4 years ago

etagwerker commented 4 years ago

Context

Initially the SkunkScore was calculated as churn cost penalty. This made sense based on the churn vs. complexity idea -> https://www.agileconnection.com/article/getting-empirical-about-refactoring

However, I quickly realized that this formula would not work when running skunk -b master -- more here: https://www.fastruby.io/blog/code-quality/escaping-the-tar-pit-at-rubyconf.html

So I decided to change the formula to be cost * penalty.

Alternatives

I think a potential solution is to apply a modified weight to churn, so that the formula could look like this:

skunk_score = (magical_weight * churn) * cost * penalty_factor

That way, the formula could work both as a snapshot and as a comparison between two branches.

Test

Testing this should show that removing complexity in a module, git committing, and then running skunk -b master produces a lower skunk score.

mateusdeap commented 1 year ago

Interesting. I've a question though: any idea on what could be this weight number and how it would change? As I see it, maybe it can be as simple as:

Because I think we're dealing with an issue of lack of information: if we want to ascribe a number to capture all the complexities of a file in a snapshot, naturally we can't take churn into account, in which case our skunk core is less precise or maybe less significant due to the lack of information.

It's more like we have one possible formula if we take the code's history into account and another formula if we don't...

One other way to think of this is analogous to the integration of a function, I believe. In math, when you integrate something over a variable, it's as if you were summing the values of the function at many points, ending up with the area under the function's curve and we could consider that our skunk score

I could try and elaborate more, but this would imply a change on how to see the skunk score: it would be the sum of the cost*penalty function over the churn.

What we'd need to define is what does churn mean for a given snapshot in order to know if this makes any sense.