MPAS-Dev / MPAS-Model

Repository for MPAS models and shared framework releases.
238 stars 317 forks source link

Initial testing framework #1218

Open islas opened 1 month ago

islas commented 1 month ago

This PR introduces testing capabilities through a series of compartmentalized commits :

  1. Setting up helper scripts to handle environment loading
  2. Compilation test script able to exercise the make build
  3. A git submodule to the hpc-workflows testing framework
  4. A test definition config using hpc-workflows
  5. CI/CD capabilities with security safeguards in place

The CI/CD capabilities rely on specific github action runner configurations with the assumption of running on Derecho, but is designed with the intention of minimal reliance on machine and CI/CD-specific tooling. Thus, one could port these features to other machines or CI/CD solutions and achieve similar results.

islas commented 1 month ago

General Instructions

Follow-up details relating to setting up self-hosted runners.

Repository Security Settings

First, minimal security barriers :

Now you are ready to set up a self-hosted runner. You should still use good PR review techniques to check that no malicious code is present in a PR BEFORE kicking off the workflow. If there are changes in .github/workflows/ or wherever you keep your tests, you should pay extra attention to these changes before allowing a workflow to run. Also consider using label triggers as an extra layer of security to not have workflows automatically start running. This goes counter to the automation process, but is more secure and could potentially save on compute resources especially if many pushes happen in a PR.

Creating self-hosted runners

Creating a self-hosted runner is now generally straight forward.

  1. Now that security is handled, go to Settings->Actions->Runners->New self-hosted runners You will be presented with a page of instructions. Select “Linux” as the runner image and x64 as the architecture. The following instructions in the web page consist of creating a directory, downloading the runner image tarball, checking the checksum, and extracting. If you are comfortable with these instructions, either copy and paste them into your terminal or modify them as you see fit. If you are not STOP, DO NOT PROCEED. We have not done any configuration yet so best to stop and ask someone who knows about self-hosted runners how to best proceed. This is important for security of the system and should not be taken lightly.

  2. Once you have your runner extracted the next instructions direct you to run the ./config.sh script with the URL to your repository and an authentication token. There are other options that may be passed into the configuration script. Please refer to the runner documentation. Any necessary missing information will be gathered via prompts. I encourage the use of labels like “”, “< runner id ##>”, and “derecho” to help identify runners if more than one will be set up.

  3. Once configuration is done, you may run ./run.sh Note: You may want to run this in a tmux or screen session to be able to detach and continue to run even when you disconnect from the computer. Additionally or alternatively you may want to have a cron job to regularly check if the runner is up. System reboots and maintenance take down runners and will need to be started again.

Self-hosted runners are removed from github if they are not connected for a period of time! (At the time of writing this 14 days) https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/removing-self-hosted-runners

Runners communication with github : https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners/about-self-hosted-runners#communication-requirements

Additional notes

I use /glade/work/$USER/github/runners/<repo>/derecho/<runner id> as a structure for setting up runners. This leads to a generally organized setup.

For runner ids I use <repo>##, increasing monotonically from 01, e.g. wrf01, wrf02

Labels I add to runners : <repo>, <runner id>, <machine> (derecho in this case)

I place logs in /glade/work/$USER/github/runners/<repo>/derecho/logs/

I name the runners <machine>-<runner id>, this is useful when having multiple runners across different machines.

I use screen to create detached sessions of the runners, and name the sockets <runner id>

For quick setup I have helper scripts available upon request, but I encourage first time setup to be done by hand to understand what is happening.

A google doc of this guide with pictures can be found here : https://docs.google.com/document/d/1CJq7NA_bh4ogB37t5Q1m9RO_2XVqPjjdmGQIH4xGWBk/edit?usp=sharing

mgduda commented 1 month ago

Would it be possible to place the hpc-workflow submodule within the .github directory to avoid adding something to the top-level MPAS-Model directory that most users shouldn't worry themselves with?

islas commented 1 month ago

That'd be doable, though I'd opt for placing it under .ci if that's the case. I think that will result in minor changes to only the .gitmodules file and the actions workflows under .github to reference the new location.

mgduda commented 1 month ago

@islas I think the .ci directory is a good idea -- let's go with that.

islas commented 1 month ago

We should be able to set up runners and test this all out inside this PR before this goes in as well