andymeneely / chromium-history

Scripts and data related Chromium's history
11 stars 4 forks source link

Replicate Major & Minor Contributors #201

Closed SidIcarus closed 9 years ago

SidIcarus commented 9 years ago

Chris Bird defines an owner of a file as someone who has commit code to a file. He further structures the developers into major and minor contributors

This percentage is calculated based on the aggregated churn from the number of non-trivial commits a person has made to a file.

Example: A file has had a total of 4 commits from 4 people.

Here's the running todo list:

Note: The number 5 is arbitrary, so don't worry about it.

SidIcarus commented 9 years ago

Hows it look, @andymeneely ?

andymeneely commented 9 years ago

I think we decided in the meeting that his metric is based on number of commits, not lines of code. Here's the quote from the paper:

Minor Contributor – A developer who has made changes to a component, but whose ownership is below 5% is considered a minor contributor to that component. This threshold was chosen based on examination of distributions of ownership 1 . We refer to a commit from a minor contributor as a minor contribution.

Major Contributor – A developer who has made changes to a component and whose ownership is at or above 5% is a major contributor to the component and a commit from such a developer is a major contribution.

Note that we examine the number of changes to a component made by a developer rather than the actual number of lines modified.

So he doesn't use churn at all (and later on says that churn is often strongly correlated with number of commits).

Thus, in your example, all of them are major contributors because they all make 25% of the commits.

In this example, Person 2 is a minor contributor:

Person 1: 10 commits Person 2: 1 commit Person 3: 5 commits Person 4: 4 commits

Thus, this file had 3 major contributors and 1 minor.

SidIcarus commented 9 years ago

Woops. Updated it.

I believe I messed it up because of that last part you quoted

Note that we examine the number of changes to a component made by developer rather than the actual number of lines modified. Within Windows, each change corresponds to one fix or enhancement and individual changes are quite small, usually on the order of tens of lines. We use number of changes because each change represents an “exposure” of the developer to the code

So we will still be determining non-trivial commits based on number of changes made, correct?

SidIcarus commented 9 years ago

cfp = Commit_Filepath.select("commit_hash, filepath").order(filepath: :asc) f = Filepath.joins(commit_filepaths: :commit)

andymeneely commented 9 years ago

Number of commits for a release filepath:

To get them all:

Filepath.joins(commit_filepaths: :commit).group('filepaths.filepath').select('filepaths.filepath, count(commits.commit_hash)')

To get one:

CommitFilepath.where(filepath: 'ui/events/x/device_data_manager.h').size
andymeneely commented 9 years ago

Here's a query for major/minor. Given a filepath ui/events/x/device_data_manager.h, and a threshold of 5% at 0 (probably unrealistic, but this query returns one row).

CommitFilepath.joins(:commit).where(filepath: 'ui/events/x/device_data_manager.h').group('commits.author_id').select('commits.author_id,count(*)').having('count(*) > 0')

Here's a commented explanation:

CommitFilepath
  .joins(:commit)
  .select('commits.author_id,count(*)') # need the aggregate function for group by
  .where(filepath: 'ui/events/x/device_data_manager.h') # only look for the file we are asking for
  .group('commits.author_id') # group by the author id so  we can count how many commits they had
  .having('count(*) > 0') # only include major contributors with 1 or more commits
  .size # just need the number of rows here
SidIcarus commented 9 years ago

This will get us author_id to filepath commits. The count can have it show us if an author has committed multiple times to a filepath.

    CommitFilepath.joins(:commit).select('commit_filepaths.filepath, commits.author_id, count(*)').group('commit_filepaths.filepath, commits.author_id').order(filepath: :asc)

also without the count and creation date

    CommitFilepath.joins(:commit).select('commit_filepaths.filepath, commits.author_id, commits.created_at)').order(filepath: :asc)

Number of commits per filepath:

  num_commits_per_filepath = CommitFilepath.select('filepath, count(*)').group('commit_filepaths.filepath') .order(filepath: :asc)

With dev data we can see that there is two files with two commits on them which can be used for the verifies.

  num_commits_per_filepath.having('count(*) > 1')