department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
96 stars 69 forks source link

Determination on best workflow tool & Implement #11554

Closed cmaeng closed 1 year ago

cmaeng commented 1 year ago

Description

As part of the migration off of BRD, there was a spike done in #9765 to inventory all CMS tasks that need to be moved.

Before starting any migrations, we need to determine the best workflow tool to leverage. We defaulted to GitHub Actions but have brainstormed multiple potential issues with that.

We'll want to coordinate with the Platform Infrastructure team on their recommendations.

Acceptance Criteria

ElijahLynn commented 1 year ago

Slack discussion here > https://dsva.slack.com/archives/CJYRZK2HH/p1668112547807419

cmaeng commented 1 year ago

Hey team! Please add your planning poker estimate with Zenhub @ElijahLynn @olivereri @timcosgrove

olivereri commented 1 year ago

Starting a discussion with Github support to explore what our options are for running workflows on a sub-5-minute schedule: https://dsva.slack.com/archives/CU1E4CX9U/p1668549708262719

olivereri commented 1 year ago

Previous exploration of Lagoon as a solution: https://github.com/department-of-veterans-affairs/va.gov-cms/issues/6673

olivereri commented 1 year ago

Impetus

https://dsva.slack.com/archives/CT4GZBM8F/p1616531738323200 VSP-Ops is planning to sunset Jenkins later this year. they will announce in the platform newsletter with additional details and guidance on transitioning Jenkins jobs to GitHub actions March 32rd 2021

Options

Take Over Existing Jenkins

Pros

Cons

Provisional Determination

A larger better staffed team no longer wishes to support or maintain Jenkins. Our team has less capacity. There is mounting pressure to deprecate the underlying network.

Non-starter

Roll Our Own Jenkins in EKS

Pros

Provisional Determination

While this would technically meet the goal of migrating away from existing Jenkins, rolling our own Jenkins will likely not fit well into a robust long-term solution. With that in mind:

At best an interim solution. Quick and Dirty.

Github Actions Worflows

Pros

Provisional Determination

Github Actions Workflows has general schedule reliability issues. Frequently run jobs will be more affected by this reliability issue but even for jobs with lower frequency, even a few inconsistently timed job runs will be annoying. This just doesn't seem suitable for important production tasks.

A note on using automation for sub-5-minute task runs. It is considered a rare or odd case to not use a cloud cron service, on server cron, or other app timer. The reason we do this is to expose the task definition to developers. Talk with Elijah about this because task definition code is exposed to devs, scheduling and method of execution is not. Devs shouldn't care about how tasks are executed, just what is executed. However, having a single task runner or other method of executing things is preferable to splitting it across multiple methods. I think.

Non-starter At best an interim solution

Lagoon

Pros

timcosgrove commented 1 year ago

This is a large ticket and will roll over.

mchelen-gov commented 1 year ago

Can this be addressed with https://github.com/department-of-veterans-affairs/va.gov-cms/issues/8849 ?

olivereri commented 1 year ago

@mchelen-gov For the every-minute-job that sends Datadog metrics and content release queues, yes that would be appropriate. However, in the broader context of what we want to achieve, no we can't address this with the Advanced Queue runner in #8849.

Thinking about it while I type this out, using the Advanced Queue runner for this more difficult to migrate tasks would provide us more flexibility. But I'm hesitant to make a expedient solution decision when we can land on a more comprehensive and long-term solution.

TheBoatyMcBoatFace commented 1 year ago

Adding to "Epic" workflow tool

TheBoatyMcBoatFace commented 1 year ago

Removed Sprint 72 and Added Sprint 73 Label

Issue larger than anticipated. Converting to Epic

TheBoatyMcBoatFace commented 1 year ago

Shifting focus from #11133 to this issue. See #11133 for additional info.

TheBoatyMcBoatFace commented 1 year ago

The findings from this issue are transitioning to the Lagoon Implementation Epic.

Lagoon Implementation Epics