epam / OSCI

Open Source Contributor Index
https://opensourceindex.io/
GNU General Public License v3.0
160 stars 95 forks source link

Clarification counting method #129

Closed OhItsLena closed 2 years ago

OhItsLena commented 2 years ago

Am I right in assuming that these are the steps you take to count the open source contributions?

  1. get commits from push event data (GH Archive/BigQuery)
  2. only keep commits to repositories, which do have a license (GitHub API for license info)
  3. match author email domains for selected organizations
  4. use the author email to identify unique contributors and count commits
  5. count total community / active contributors

I just wanted to clarify so I better understand how to interpret the results. Great project.

vlad-isayko commented 2 years ago

Yes, @OhItsLena ,you are absolutely right in general, this is exactly the logic that is laid down. Unless the order of actions is different: steps 2 and 3 need to be swapped.

OhItsLena commented 2 years ago

Absolutely makes sense from a data processing standpoint. Thanks for clarifying @vlad-isayko!