TruCol / Self-host-GitLab-CI-for-GitHub

Installs your own GitLab CI and runs it on all your GitHub repos, in a single command.
GNU Affero General Public License v3.0
4 stars 3 forks source link

Reduce GitHub bandwidth by not cloning repository every commit or branch. #93

Closed a-t-0 closed 2 years ago

a-t-0 commented 2 years ago

Currently, both the run per repo, and run per commits clone a github repo every time a new branch is checked out. (or commit for the commit one).

Reduce the GitHub bandwidth for ci per repo by only downloading repo once then looping over branches.

Reduce GitHub bandwidth for ci per commit by only downloading per rpo once, and checking out particular commits.

a-t-0 commented 2 years ago

Currently for running ci on latest commit per repo:

  1. the GitHub build status repo is downloaded once per repository in:
  2. repo on which the GitHub CI is ran is downloaded once per repository in:
  3. The build status results is pushed once per repository in:

    run_ci_on_github_repo() {
    github_username="$1"
    github_repo_name="$2"
    local organisation="$3"
    
    # Get the GitHub build status repository.
    get_build_status_repository_from_github
    
    # TODO: change this method to download with https?
    # Download the GitHub repo on which to run the GitLab CI:
    printf "\n\n\n Download the GitHub repository on which to run GitLab CI."
    download_github_repo_on_which_to_run_ci "$github_username" "$github_repo_name"
    
    # Remove the GitLab repository. # TODO: move this to each branch
    # Similarly for each commit
    remove_the_gitlab_repository_on_which_ci_is_ran
    
    # TODO: write test to verify whether the build status can be pushed to a branch. (access wise).
    # TODO: Store log file output if a repo (and/or branch) have been skipped.
    # TODO: In that log file, inlcude: time, which user, which repo, which branch, why.
    printf "\n\n\n Exporting GitLab CI result back to a GitHub repository."
    copy_github_branches_with_yaml_to_gitlab_repo "$github_username" "$github_repo_name" "$organisation"
    printf "DONE WITH run CI"
    
    # push build status icons to GitHub build status repository.
    push_commit_build_status_in_github_status_repo_to_github "$github_username"
    }

    This can be made such that it consumes even less bandwidth by downloading the build status repo and pushing build status results once per organisation. However, that would imply a single error may lead to build status results being not pushed for 20 repositories. So a conservative approach is taken that consumes slightly more bandwidth. This is expected to be less likely to trigger manual GitHub rate limiting, as after the first run, only new commits with GitLab yamls will require a push of the new CI results. Hence this is expected to be not that often. (So entire GitHub repos of orga can be scanned, without requiring a single build status repo push, even when it pushes per repo, because there will be almost no changes.).

To ensure it does not clone the build status results anyway, verify whether get_build_status_repository_from_github does not first delete the repo before cloning it. This could save bandwidth. It does: download_and_overwrite_repository_using_ssh. So get_build_status_repository_from_github is moved up into once per organisation.

a-t-0 commented 2 years ago

# push build status icons to GitHub build status repository. push_commit_build_status_in_github_status_repo_to_github "$github_username" is also moved to once per organisation for the run latest commit for each repo of an organisation.

a-t-0 commented 2 years ago

For the run for all commits the download GitHub build status repo is also moved to once per organisation, and pushing the build status results is also moved to once per organisation.