datactive / bigbang

Scientific analysis of collaborative communities
http://datactive.github.io/bigbang/
MIT License
154 stars 51 forks source link

Author Attributes and Graphs #184

Open falahat opened 9 years ago

falahat commented 9 years ago

When we first created the git-collection capability, we gathered some basic info about contributors (number of commits, times of commits.)

We can look deeper now and look at other metrics, such as:

  1. number of lines added
  2. The "complexity" of their code (after we add code analysis metrics)
  3. Frequency of their commits
  4. The average size of their commits
  5. Seasonality of their commits (for example, my commits drop to zero during breaks)
  6. ... and many, many more!
sbenthall commented 9 years ago

According to @dwins it was the received wisdom that no code complexity metric was more predictive of bugs/error than number of lines of code. So (as I think you brought up in a meeting the other day) that would be a good place to start.

nborwankar commented 9 years ago

Pardon me for jumping in unexpectedly, but # lines of code is correlated with # of bugs purely from proportionality. Of course bigger code bases will have more bugs in general. This may be "predictive" in some sense but provides little additional insight.

FWIW, "#bugs per line of code" as a metric will remove the confounding effect of codebase size, and is IMHO a candidate "good" metric.

Cheers,

Nitin


Nitin Borwankar nborwankar@gmail.com

On Fri, May 8, 2015 at 10:20 AM, Sebastian Benthall < notifications@github.com> wrote:

According to @dwins https://github.com/dwins it was the received wisdom that no code complexity metric was more predictive of bugs/error than number of lines of code. So (as I think you brought up in a meeting the other day) that would be a good place to start.

— Reply to this email directly or view it on GitHub https://github.com/sbenthall/bigbang/issues/184#issuecomment-100302985.

sbenthall commented 9 years ago

That's a good point, Nitin. To get that metric, we will need a way to get an independent measure of bugs, like from issue tracker data. I think that's out of scope for this ticket, which is about deriving metrics about code from repository data.