Initial testing framework

islas commented 1 month ago

This PR introduces testing capabilities through a series of compartmentalized commits :

Setting up helper scripts to handle environment loading
Compilation test script able to exercise the make build
A git submodule to the hpc-workflows testing framework
A test definition config using hpc-workflows
CI/CD capabilities with security safeguards in place

The CI/CD capabilities rely on specific github action runner configurations with the assumption of running on Derecho, but is designed with the intention of minimal reliance on machine and CI/CD-specific tooling. Thus, one could port these features to other machines or CI/CD solutions and achieve similar results.

islas commented 1 month ago

General Instructions

Follow-up details relating to setting up self-hosted runners.

Repository Security Settings

First, minimal security barriers :

[ ] Restrict which actions and reusable workflows may be executed in this repository. Do this by going to Settings->Actions->General->Action Permissions within a repository (this may differ for an organization if setting up organization runners but is beyond the scope of this small guide). You may select “Allow , and select non-, actions and reusable workflows” and then check “Allow actions created by GitHub” to allow GitHub-provided actions while still restricting foreign non-vetted code.
[ ] Once again going to Settings->Actions->General->Action Permissions go to “Fork pull request workflow from outside collaborators” and select one of the radio buttons : a. Require approval for first-time contributors OR b. Require approval for all outside collaborators The latter is more secure but you may find it overly burdensome to approve for well-known outside contributors. Choose carefully when selecting based on your project’s visibility and expected development. If unsure use (b) as being more secure rarely is bad.
[ ] If not already set, in the same Settings->Actions->General->Actions Permissions, change the default GITHUB_TOKEN permissions to read only. Workflows should use the least privilege necessary to complete their tasks, and this ensures that it starts as low as possible.

Now you are ready to set up a self-hosted runner. You should still use good PR review techniques to check that no malicious code is present in a PR BEFORE kicking off the workflow. If there are changes in .github/workflows/ or wherever you keep your tests, you should pay extra attention to these changes before allowing a workflow to run. Also consider using label triggers as an extra layer of security to not have workflows automatically start running. This goes counter to the automation process, but is more secure and could potentially save on compute resources especially if many pushes happen in a PR.

Creating self-hosted runners

Creating a self-hosted runner is now generally straight forward.

Now that security is handled, go to Settings->Actions->Runners->New self-hosted runners You will be presented with a page of instructions. Select “Linux” as the runner image and x64 as the architecture. The following instructions in the web page consist of creating a directory, downloading the runner image tarball, checking the checksum, and extracting. If you are comfortable with these instructions, either copy and paste them into your terminal or modify them as you see fit. If you are not STOP, DO NOT PROCEED. We have not done any configuration yet so best to stop and ask someone who knows about self-hosted runners how to best proceed. This is important for security of the system and should not be taken lightly.
Once you have your runner extracted the next instructions direct you to run the ./config.sh script with the URL to your repository and an authentication token. There are other options that may be passed into the configuration script. Please refer to the runner documentation. Any necessary missing information will be gathered via prompts. I encourage the use of labels like “”, “< runner id ##>”, and “derecho” to help identify runners if more than one will be set up.
Once configuration is done, you may run ./run.sh Note: You may want to run this in a tmux or screen session to be able to detach and continue to run even when you disconnect from the computer. Additionally or alternatively you may want to have a cron job to regularly check if the runner is up. System reboots and maintenance take down runners and will need to be started again.

Self-hosted runners are removed from github if they are not connected for a period of time! (At the time of writing this 14 days) https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/removing-self-hosted-runners

Runners communication with github : https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#communication-requirements

Additional notes

I use /glade/work/$USER/github/runners/<repo>/derecho/<runner id> as a structure for setting up runners. This leads to a generally organized setup.

For runner ids I use <repo>##, increasing monotonically from 01, e.g. wrf01, wrf02

Labels I add to runners : <repo>, <runner id>, <machine> (derecho in this case)

I place logs in /glade/work/$USER/github/runners/<repo>/derecho/logs/

I name the runners <machine>-<runner id>, this is useful when having multiple runners across different machines.

I use screen to create detached sessions of the runners, and name the sockets <runner id>

For quick setup I have helper scripts available upon request, but I encourage first time setup to be done by hand to understand what is happening.

A google doc of this guide with pictures can be found here : https://docs.google.com/document/d/1CJq7NA_bh4ogB37t5Q1m9RO_2XVqPjjdmGQIH4xGWBk/edit?usp=sharing

mgduda commented 1 month ago

Would it be possible to place the hpc-workflow submodule within the .github directory to avoid adding something to the top-level MPAS-Model directory that most users shouldn't worry themselves with?

islas commented 1 month ago

That'd be doable, though I'd opt for placing it under .ci if that's the case. I think that will result in minor changes to only the .gitmodules file and the actions workflows under .github to reference the new location.

mgduda commented 1 month ago

@islas I think the .ci directory is a good idea -- let's go with that.

islas commented 1 month ago

We should be able to set up runners and test this all out inside this PR before this goes in as well

MPAS-Dev / MPAS-Model