Open phfaist opened 2 years ago
Here is a potential blueprint for the logic of such a script (in pseudo-python). The strategy is to go through all repository commits in chronological order and take note of contributors to each code (identified by code_id
). Using the code_id
to identify codes will help with files that were moved around in the git tree, and for which git doesn't display history past the file rename point.
# dictionary of code_id -> list of author info dictionaries
codes_contributors_information = {}
for commit_object in (traverse through all commits of the repo in chronological order):
# get author information associated with that commit_object
author_information = {
'githubusername': ...,
'name': ...,
}
for yml_file in (all YML code file changes in commit_object):
code_id = (read the code_id field in the YAML file)
is_change_significant = get_is_change_significant( ... )
if is_change_significant:
# add this author to the list of contributors to that code
if code_id not in codes_contributors_information:
codes_contributors_information[code_id] = []
codes_contributors_information[code_id].append( author_information )
# update the codes tree.
for code_id, list_of_contributors in codes_contributors_information:
# fetch the YML file associated with the code ID (available through via
# ecczoogen, from the generator code)
code_yml_file = zoo.get_code(code_id).source_info_filename
... # append the relevant information to the data in the code YAML file
def get_is_change_significant( ... ):
... # logic to detect whether a change was substantial (to be listed as a contributor) is
# coded here. A "substantial contribution" means basically more than fixing a
# few typos.
# This function's parameters should include at least the YML file name and
# the commit id, so we can call "git --word-diff=porcelain"
Notes:
code_id
with zoo.get_code(code_id)
. See this line.git --word-diff=porcelain
, and look at the number of word changes. We might have to test this and tweak it to get good resultscode_id
in the history of a code. We will either have to deal with these manually, or find some other fix. What seems most reasonable is to list all the encountered code_id
's that don't exist in the current tree; then we can re-run the script with a hard-coded mapping of old code_id
's to new code_id
's, making sure the script takes note of changes directly under the new code_id
.
See issue eczoo_data#218.
The script that does the changes should be added into this repository (e.g., by pull request to claim the UnitaryHACK bounty) under the folder
This way, we can play around with the script, do additional tests to make sure the contributors are captured in the way we were intending to, and carry out any additional necessary tweaks.
Thanks to UnitaryHACK-ers that want to contribute!