errorcorrectionzoo / eczoo_generator

2 stars 1 forks source link

Provide acknowledgments to Gibhub editors of each code #6

Open phfaist opened 2 years ago

phfaist commented 2 years ago

See issue eczoo_data#218.

The script that does the changes should be added into this repository (e.g., by pull request to claim the UnitaryHACK bounty) under the folder

tools/contributors_via_git/

This way, we can play around with the script, do additional tests to make sure the contributors are captured in the way we were intending to, and carry out any additional necessary tweaks.

Thanks to UnitaryHACK-ers that want to contribute!

phfaist commented 2 years ago

Here is a potential blueprint for the logic of such a script (in pseudo-python). The strategy is to go through all repository commits in chronological order and take note of contributors to each code (identified by code_id). Using the code_id to identify codes will help with files that were moved around in the git tree, and for which git doesn't display history past the file rename point.

# dictionary of code_id -> list of author info dictionaries
codes_contributors_information = {}

for commit_object in (traverse through all commits of the repo in chronological order):

    # get author information associated with that commit_object
    author_information = {
      'githubusername': ...,
      'name': ...,
    }

    for yml_file in (all YML code file changes in commit_object):
        code_id = (read the code_id field in the YAML file)

        is_change_significant = get_is_change_significant( ... )
        if is_change_significant:

            # add this author to the list of contributors to that code
            if code_id not in codes_contributors_information:
                codes_contributors_information[code_id] = []
            codes_contributors_information[code_id].append( author_information )

# update the codes tree.
for code_id, list_of_contributors in codes_contributors_information:

    # fetch the YML file associated with the code ID (available through via
    # ecczoogen, from the generator code)
    code_yml_file = zoo.get_code(code_id).source_info_filename

    ... # append the relevant information to the data in the code YAML file

def get_is_change_significant( ... ):
    ... # logic to detect whether a change was substantial (to be listed as a contributor) is
        # coded here. A "substantial contribution" means basically more than fixing a
        # few typos.
        # This function's parameters should include at least the YML file name and
        # the commit id, so we can call "git --word-diff=porcelain"

Notes: