BonnyCI / mateys-ahoy

Individual contributor analysis (see shuffleboard)
Apache License 2.0
0 stars 2 forks source link

Define metrics to be collected from Git Commit History #30

Open missaugustina opened 7 years ago

missaugustina commented 7 years ago

Theories

If a company has influence in a project, some proportion of authors associated with that company will identify the association in their commit activity.

It's not clear how accurate the company affiliation in the Github profile is.

If an author's contributions are condoned by their company, they should be using company resources to make those contributions. Committers who use their work email address likely have a strong company affiliation or else they wouldn't be using company resources in such a publicly visible space. Committers who do not use their work email may want to obscure their employer affiliation.

Extracting company affiliation through commit email domain name is sufficient to identify company affiliation both historically and presently.

Commit months is a better indicator of company influence than number of commits.

Because it is not possible to accurately identify company affiliation for every single committer, rankings depending on the number of commits made by affiliated authors are insufficient for determining a company's involvement with a project.

Commit months per company

On the other hand, measuring whether a company had at least one author committing to the repository using their company email address will indicate whether or not the company had any involvement for a given time period. This also provides a clear historical pattern of activity, particularly how long the company has been represented in the project.

Commit months per author per company

Evaluating whether an affiliated author had at least one commit using their company email address indicates how much time a company is allowing its affilated authors to work upstream. If the company allows a significant proportion of their time to be spent upstream, the authors will have more consistent commit months. This also is not susceptible to potential skewing in the same way as numbers of commits.

If the majority of authors has only one commit month, it suggests that while a company may have a large number of commits and a large number of authors, they may not be consistently engaging with the community.

Project Participation

Commits percent change: how proportion of commits changes over time (what proportion is a min/max)

Commit months per author is an interesting metric for estimating project popularity. A high proportion of drive-by authors (authors with a low number of commit months or 100% shared commit months) indicates developers are actively using the project.

Additionally, non-human users in projects with a high proportion of drive-by authors can be identified by a significantly higher proportion of commits than the other authors. The activity in this case is masked by an internal Google process so it's not clear what analysis would be beneficial without something else to compare to. However, projects with open governance that has clear "bot users" could be analyzed and used as a foundation for comparison.

missaugustina commented 7 years ago

Next steps, fix up this analysis and run it on other repos in the list.