NOAA-EMC / RDASApp

Regional DAS
GNU Lesser General Public License v2.1
1 stars 9 forks source link

Automatically build and test RDASPP (Hera: using role.rrfs-fv3-cam, Jet: using role.wrfruc) #177

Open guoqing-noaa opened 2 weeks ago

guoqing-noaa commented 2 weeks ago

I would expect it will take at least 6 months (or even longer) to establish CI tests for RDASApp (through Jenkins, I think).

Before that, we still need some kind of automatic build and test of RDASApp to eliminate mergers' manual testing before a PR merge. _(if a PR does NOT change the build behavior nor change the codes, we can skip this build_andtest process).

And we can establish this functionality right now.

The general idea is to set a crontab job on Hera/Jet/Hercules. It runs every 5 minutes, use the Github command line tool to check current available PRs which has already been marked with a label "read_to_build_test" and then clone that PR, build, run rrfs tests, post the test results to the PR webpage. If succeeded, add a label "hera_passed"; if failed, add a label "hera_failed" (similar things for jet, hercules).

@SamuelDegelia-NOAA Are you interested in working on this together?

ShunLiu-NOAA commented 2 weeks ago

@guoqing-noaa I remember that inherited CI function from GDASApp. @TingLei-daprediction, @CoryMartin-NOAA and @delippi used to work on this. We need to add this function to role account. Let's discuss this later.

guoqing-noaa commented 2 weeks ago

@guoqing-noaa I remember that inherited CI function from GDASApp. @TingLei-daprediction, @CoryMartin-NOAA and @delippi used to work on this. We need to add this function to role account. Let's discuss this later.

@ShunLiu-NOAA Thanks for the information! It looks like we do have some scripts there under the ci/ directory. We can start from there. And we need this as soon as possible. It is preferred that we do a thorough fresh tests on at least Hera/Jet/Hercules before merging a PR which changes the codes or the build behaviors.

@CoryMartin-NOAA Could you help us understand how GDASApp launches CI tests on RDHPCS? Through a cron job or through other mechanism like Jenkins? Thanks!

SamuelDegelia-NOAA commented 2 weeks ago

@guoqing-noaa I'm open to helping with this but I will wait for others to chime in with thoughts.

guoqing-noaa commented 2 weeks ago

Update on this: The RDHPCS admin confirmed that we don't have Jenkins server on any RDHPCS and no CI/CD through github allowed on on-prem systems (although RDHPCS cloud is allowed)

RDHPCS IMT Decision It is the decision of the RDHPCS Integrated Management Team (IMT) to deny this request for on-premise HPC systems. Although the IMT recognizes the need to have Continuous Integration code development on HPC resources, it has been deemed too risky to allow this functionality to exist on a shared on-premise HPC resource, including Jet, Hera, Niagara, and PPAN. Due to the transient and isolated nature of HPC in the Cloud, the RDHPCS IMT authorizes the use of these GitHub Runners on RDHPCS Cloud resources.

Although we may use https://github.com/jenkinsci/jenkinsfile-runner but that loses the benefits of a Jenkins server which can trigger CI/CD automatically.

So, @SamuelDegelia-NOAA let's go ahead to implement the cron-job based CI/CD.

guoqing-noaa commented 2 weeks ago

I can understand that we do ctests inside the RDASApp. But why we want to put the git clone part inside RDASApp which we intend to clone and test? Does it mean we use a previous version of RDASApp to clone the PRs' repo and branches?

ShunLiu-NOAA commented 2 weeks ago

@guoqing-noaa Let's discuss this after I collect more information from EMC collogues.

guoqing-noaa commented 2 weeks ago

I've set up cron jobs (every 5 minutes) on the following platforms using the corresponding role accounts:

hera        <->   role.rrfs-fv3-cam
jet         <->   role.wrfruc
hercules    <->   role-wrfruc

Once a PR gets two approvals and is ready for a potential merge, the repo maintainers manually add 'test_hera', test_jet, and test_hercules labels to this PR, and then the cron jobs on those platforms will be triggered.

I tried this automatic build_and_test on a fork: https://github.com/comgsi/RDASApp/pulls It worked as expected. I will implement this to this authoritative repo soon.

NOTE: there are NO changes to the RDASApp repo itself. It works as if there is one person who "manually" clones/builds/tests RDASApp on the above platforms respectively.

guoqing-noaa commented 2 weeks ago

To clarify, this is NOT the normal CI/CD we would usually expect (i.e. no Jenkins step at the moment). But we will use this to automatically build and test every PR until a streamlined CI/CD is in place.

ShunLiu-NOAA commented 2 weeks ago

@guoqing-noaa great progress.

guoqing-noaa commented 1 week ago

update on this: @ShunLiu-NOAA @SamuelDegelia-NOAA and I had a tag up today and we decided to test drive the CI tests on RDASApp. The first try was on PR #175 and it worked well.

Also, as Sam suggested, the cron job at different HPCs will automatically remove the corresponding testing directory once a PR is merged. To compensate the HPC downtime, there is an extra mechanism which will remove all testing directories older than 14 days. All these will ensure that the CI tests will NOT consume too much disk space.

The code management policy was updated accordingly.

CoryMartin-NOAA commented 1 week ago

Just to chime in here, but I think you all figured most of it out. The capability is using a cron job with GitHub CLI that checks for labels attached to open pull requests, and if the labels match, run tests, if the tests pass or fail, a new label is applied. I'm happy to discuss details on how it works for the global.

guoqing-noaa commented 1 week ago

@CoryMartin-NOAA Thanks for the information.