Provide acknowledgments to Gibhub editors of each code

Here is a potential blueprint for the logic of such a script (in pseudo-python). The strategy is to go through all repository commits in chronological order and take note of contributors to each code (identified by code_id). Using the code_id to identify codes will help with files that were moved around in the git tree, and for which git doesn't display history past the file rename point.

# dictionary of code_id -> list of author info dictionaries
codes_contributors_information = {}

for commit_object in (traverse through all commits of the repo in chronological order):

    # get author information associated with that commit_object
    author_information = {
      'githubusername': ...,
      'name': ...,
    }

    for yml_file in (all YML code file changes in commit_object):
        code_id = (read the code_id field in the YAML file)

        is_change_significant = get_is_change_significant( ... )
        if is_change_significant:

            # add this author to the list of contributors to that code
            if code_id not in codes_contributors_information:
                codes_contributors_information[code_id] = []
            codes_contributors_information[code_id].append( author_information )

# update the codes tree.
for code_id, list_of_contributors in codes_contributors_information:

    # fetch the YML file associated with the code ID (available through via
    # ecczoogen, from the generator code)
    code_yml_file = zoo.get_code(code_id).source_info_filename

    ... # append the relevant information to the data in the code YAML file

def get_is_change_significant( ... ):
    ... # logic to detect whether a change was substantial (to be listed as a contributor) is
        # coded here. A "substantial contribution" means basically more than fixing a
        # few typos.
        # This function's parameters should include at least the YML file name and
        # the commit id, so we can call "git --word-diff=porcelain"

Notes:

The site generator script / ecczoogen package has code that can load the whole codes tree, and we can easily find a code YAML file by its code_id with zoo.get_code(code_id). See this line.
I think what we need to determine if a change is significant is to parse the output of git --word-diff=porcelain, and look at the number of word changes. We might have to test this and tweak it to get good results
The above logic doesn't account for changes in code_id in the history of a code. We will either have to deal with these manually, or find some other fix. What seems most reasonable is to list all the encountered code_id's that don't exist in the current tree; then we can re-run the script with a hard-coded mapping of old code_id's to new code_id's, making sure the script takes note of changes directly under the new code_id.

errorcorrectionzoo / eczoo_generator

Provide acknowledgments to Gibhub editors of each code #6