GEOS-ESM / CI-workflows

Test repo for CI workflows
Apache License 2.0
1 stars 3 forks source link

Workflow for checking pinned hashes in existing jedi bundle on discover #13

Open asewnath opened 3 weeks ago

asewnath commented 3 weeks ago

@jardizzo @ashiklom @Dooruk

I have a pinned version JEDI bundle in advda here: /discover/nobackup/projects/gmao/advda/pinned_jedi_bundle. This is the JEDI bundle that Swell points to if a user wants to use an existing pinned JEDI build (https://github.com/GEOS-ESM/swell/pull/433). The JEDI repositories in this bundle are currently pinned for August 31 as @Dooruk recommends.

I would like to make a test that uses Swell's check_hashes tool to check whether the hashes in /discover/nobackup/projects/gmao/advda/pinned_jedi_bundle correspond to the hashes tracked in Swell (https://github.com/GEOS-ESM/swell/blob/feature/pinned_versions_support/src/swell/utilities/pinned_versions/pinned_versions.yaml). This way, we can be sure that these hashes match during Swell PRs.

Let me know your thoughts on this.

Dooruk commented 3 weeks ago

I like the test idea and having a maintained static folder on advda. To respond to your question on the draft PR, I think if use_pinned_existing fails it should warn the user and prompt them to use pinned_create and not automatically start building as that would be confusing for new users.

I have a way of testing adding new CI workflows to this repo, I will share with you.

I would like to keep 2-3 of the latest builds just as a fail-safe, they could be named pinned_jedi_bundle_{date} after they are done being the main builds. Tier 2 would still run the JEDI develop nightly (separate concern).

asewnath commented 3 weeks ago

The upside of building automatically if use_pinned_existing fails (for instance, if the maintained JEDI bundle in advda isn't updated for whatever reason) is that the user then has a local pinned JEDI bundle that they can build once and then link to for future experiments.

If we make use_pinned_existing the default jedi_build_method, then users won't have to specify a directory where their own pinned JEDI bundle build is should the one in advda fail. Additionally, Swell would automatically update it to the correct hash if that local build already exists but needs to be updated. Using pinned_create, like create, forces the experiment to build JEDI in the experiment directory every time it runs.

Dooruk commented 2 weeks ago

The upside of building automatically if use_pinned_existing fails (for instance, if the maintained JEDI bundle in advda isn't updated for whatever reason) is that the user then has a local pinned JEDI bundle that they can build once and then link to for future experiments.

I see your point. I'm trying to imagine different use cases (and maybe even confusing myself) to figure out when a user would like to build JEDI on their own:

User X, JEDI contributors (only a handful of users), will use jedi_bundle to build and work on their own repo branches.

User A, needs a particular JEDI build right after a UFO PR is merged, say September 14th. They will use jedi_bundle with pinned option to build their own JEDI version, and then set their Swell experiment.yaml source and build to this folder to test first with local build. After they are content, they will test with the advda build before they can make a Swell PR (check_hashes.py controls this now).

Do we expect User A to first use jedi_bundle to get hashes and then copy them to Swell pinned_versions.yaml?

User B, wants to test/edit some observation filters, they clone Swell right before a pinned_version.yaml change, when they want to do a PR check_hashes catches this, throws an error. So the person now can test this with the new pinned JEDI build. If issues arise they become User A.

User C (long term, needs further pondering), wants to run a particular suite with particular JEDI/GEOS version(s). For instance let's imagine we have a GEOS-ADAS suite. We would have pinned builds specified for those suites that gets updated infrequently (once every couple of months).

Perhaps you have a use case in mind where use_pinned_existing should start a new JEDI build?

Additionally, Swell would automatically update it to the correct hash if that local build already exists but needs to be updated. Using pinned_create, like create, forces the experiment to build JEDI in the experiment directory every time it runs.

Even for this case, I always think a clean build in a new folder is better. Wei and Jianjun had issues while trying to rebuild in the same build folder.

asewnath commented 1 week ago

I'm going to pause creating a github action to check hashes for now. @Dooruk using your instructions, I tried adding a test to test_swell.yml and tried running swell's Test_CI_Application action. This fails immediately because of the error "Tier 2 is already running". I'm not able to test using Tier1 test yamls unless I have a PR to the main branch in CI-workflows.

Dooruk commented 1 week ago

I'm going to pause creating a github action to check hashes for now. @Dooruk using your instructions, I tried adding a test to test_swell.yml and tried running swell's Test_CI_Application action. This fails immediately because of the error "Tier 2 is already running". I'm not able to test using Tier1 test yamls unless I have a PR to the main branch in CI-workflows.

Yeah, Tier2 is hitting the __running__ switch file issue but I'm not sure about the Tier1 yaml change issue you are encountering? YOu mean your open PR should be merged first?

asewnath commented 1 week ago

I mean that it seems that Tier1 runner doesn't work for any other branch besides main https://github.com/GEOS-ESM/swell/blob/597bbbe9c87867178a130a10c6d07418d1a212d8/.github/workflows/tier1_application_discover.yml#L15. If I tried to point to a different branch, nothing would run. Maybe this is an issue on my end.

If I can only run Tier1 tests on the main branch, then I'd have to continually PR to the main branch to get the check hashes test working, which doesn't sound like a good idea.