bssw-psip / reposcanner

A compact repository data mining toolkit
Other
4 stars 0 forks source link

Tiers of Authorship-ness #7

Open frobnitzem opened 4 years ago

frobnitzem commented 4 years ago

Classify Authors Into Tiers:

For "Funded" - How to Corroborate Contribution:

elaineraybourn commented 4 years ago

For "author" we use "contributor" in our research study document. In developing our contributor or "author" classification scheme for data analysis we need to be clear of the scope of the analysis, and how we are defining terms. I propose the following definitions to guide data analysis: Phase 1 (Tier 1 -- lowest level of analysis) Repo is defined as a GitHub repository. For the purposes of this Phase, we are interested primarily in repos that are associated with ECP projects. Commit is defined as a save (of the current state, or snapshot) of the repository. Contributor is defined as a unique user ID with 1 or more commits to 1 or more repos attributed to an ECP project. Contributor ranking is defined as the number of commits. The greater the number of commits, the greater the ranking of the contributor. Contribution is defined as a commit generated by a human, and potentially, a contribution by a bot created by a human. Cross-repo contribution is defined as one or more commits by a unique contributor to two or more different repos Cross-project contribution is defined as one or more commits to one or more ECP project repos. A project is defined as a formal collection of repos (e.g. ADTM, ALPINE). Only one commit to any ECP project repo is necessary to be considered a contribution to the ECP project, if multiple repos exist in the project. Contributor network ranking is defined as the number of commits in repos that are attributed to a number of different ECP project repos. The greater the number of commits and greater the number of ECP project repos the higher the ranking of the individual contributor.