akuity / kargo

Application lifecycle orchestration
https://kargo.akuity.io/
Apache License 2.0
1.39k stars 114 forks source link

Multiple pipelines #1680

Open jessesuen opened 3 months ago

jessesuen commented 3 months ago

Proposed Feature

When talking to various users about kargo use cases, one limitation that became apparent was that Kargo's single freightline could be limiting since it forces all changes to happen as long as they are in the tip. In our git + image repository example, we use a single Warehouse that subscribes to both the git repo and the image repo, then permutates the combination. One feedback was that image promotion may happen at a separate cadence from other promotable configurations, e.g., a feature flag.

Container images are typically the most common and fastest thing to push through to environments. However, a feature flag or manifest change could be much slower and roll out over the course of days or weeks.

Kargo could solve this by allowing the use of multiple freightlines. For example, a user could configure:

Motivation

Allowing for multiple freightlines would allow different categories configurations to be promoted faster than others.

Suggested Implementation

When thinking about this, the warehouse resource seems appropriate as the originator of each Freightline. So there would be a 1:1 relationship between a Warehouse and a Freightline.

The biggest change I imagine, would be to the Kargo Stage spec/status and controller, which would have to be reworked quite extensively to support this.

jessesuen commented 3 months ago

Probably depends on https://github.com/akuity/kargo/issues/1613

krancour commented 2 months ago

@jessesuen and I had some offline discussion about this yesterday and I just want to capture key snippets of what we discussed and where we landed.

We agreed on three use cases that we care about WRT multiple freightlines:

  1. Kent: implementing a short pipeline that subscribes to new images and updates main and nothing else -- then a longer one that subscribes only to git and picks up where the previous left off.

    Filling in some unspoken blanks here, this means two warehouses, each subscribed to one thing, and each the source of Freight for a distinct Freightline.

  2. Kent: multiple parallel pipelines that are, organizationally, part of the same project, but involve entirely different sets of artifacts

    for a large service, or even maybe the microservices use case, I can see this being useful... I can see users asking for a project containing more than one pipeline (because of they want to share visibility within same project, or simplify tenancy mangement).

  3. From this issue: Container images are typically the most common and fastest thing to push through to environments. However, a feature flag or manifest change could be much slower and roll out over the course of days or weeks.

    The biggest change I imagine, would be to the Kargo Stage spec/status and controller, which would have to be reworked quite extensively to support this.

No.3 has been the most difficult change to grapple with. In offline conversations, we knew this would mean allowing each Stage to concurrently host multiple Freight from different Warehouses. This is a major departure from how things are done today. Today, each piece of Freight is a specific combination of artifacts that together, represent a known state. Supporting multiple Freight from different Warehouses at each Stage would mean Stage-specific combinations of artifacts would exist. This would cause a big problem with verifications as illustrated by this scenario:

Kent:

  1. image is at v1, manifests at v1 -- all stages (let's call them dev, uat, and prod) are at v1,v1 (format is <image,manfiest>)
  2. new manifest v2 is found -- dev is now at v1,v2
  3. new image v2 is found -- dev is now at v2,v2
  4. assuming images are to be progressed through this pipeline at a faster pace than manifest changes, the desire is now to put uat in v2,v1 (newer image, older manifest). the thing that is problematic here, i think, is that this specific combination has not been tested/verified upstream from uat

so i'm not sure how verification should work under this model

We ended up agreeing we didn't want to lose guarantees that specific combinations of artifacts available to a given Stage have been validated upstream and this prompted us to consider an alternative to the idea of multiple Freight per Stage...

Jesse: I've been thinking about this some more, I think another way that we can possibly solve this, is provide a way where freight generation permutation is more manual.

Kent: manually permuting freight was always on my agenda

Kent: so we continue with the notion of one freight per stage, but provide some sort of "freight lab" where you can mix n match?

Jesse: Exactly

Kent: in terms of what ux might look like for this "freight lab" (not a serious name suggestion), i am thinking of a tab or something in the warehouse view (which doesn't exist at all yet), where the last n versions of each subscription are listed and you may select one of each to generate a new freight with that specific combination

zhenya-khvan-form3 commented 2 months ago

I think manual freight creation is a great idea. I was wondering if Kargo would be open to also having a way to subscribe to multiple criteria for generating freights?

For example, if an organization has dedicated a stage for pre-release versions of a particular application, it would be nice to have a mechanism to auto-create the freights and auto-promote the freight to that "development" stage.

This would be useful for being able to test pre-releases on development environments in quick iteration.

Potential approach

If a warehouse could support something like the following change:

 apiVersion: kargo.akuity.io/v1alpha1
 kind: Warehouse
 metadata:
   name: kargo-service
   namespace: kargo-service
 spec:
   subscriptions:
   - git:
       repoURL: https://github.com/org/repo.git
       branch: master
   - image:
       repoURL: ghcr.io/org/service
-      semverConstraint: ">= 0.0.0"
+      selectors:
+        - semverConstraint: ">= 0.0.0" # regular freights
+        # Freights with custom metadata attached
+        - semverConstraint: ">= 1.2.3-rc <= 1.2.3-rc-0"
+          metadata:
+            labels:
+              prerelease: true

Then, update the auto-promotion policies to allow specifying a filter/constraint:

 apiVersion: kargo.akuity.io/v1alpha1
 kind: Project
 metadata:
   name: kargo-service
 spec:
   promotionPolicies:
     - stage: development
       autoPromotionEnabled: true
     - stage: development-alpha
       autoPromotionEnabled: true
+      selectors:
+        matchLabels:
+          prerelease: true
     - stage: development-beta
       autoPromotionEnabled: true
     - stage: development-gamma
       autoPromotionEnabled: true

With this change, it would mean that a warehouse can potentially generate more than 1 freight per reconciliation.

krancour commented 2 months ago

@zhenya-khvan-form3 there should already be enough flexibility in Kargo to approach what you're suggesting in a few different ways.

You can write more general image selectors that match your RCs and your normal versioning scheme as well and they will put both into the same freightline/pipeline. Not everything in the freightline needs to be promoted to every Stage. If something needs to go direct to a dedicated environment, you can do a manual approval to open up promotion to that Stage without needing to traverse all the upstream Stages first.

For example, if an organization has dedicated a stage for pre-release versions of a particular application

You can have two Warehouses with different image selection criteria. One of them feeds your "normal" pipeline and the other feeds your dedicated pre-release Stage.

With this change, it would mean that a warehouse can potentially generate more than 1 freight per reconciliation.

I just wouldn't want to introduce complications like this when there are ways to approach this that already work.

zhenya-khvan-form3 commented 2 months ago

@krancour thanks for entertaining the idea. I think manual approvals for manually created freights will solve the use-case we have, but it would be really nice to allow a mechanism to leverage the auto-promotions.

I can describe our use-case a bit more just for context, but happy to put a pin on the idea after.

Our organization has several development clusters that we share across the company. For example, let's say:

A common use-case for us is for a developer to "take over" a particular cluster in order to test changes. This is a short-term operation, maybe a few hours to a few days. The developer will configure the "take over" for a particular cluster (dev-a) to pull the latest changes from their PR. Then, any new commits to their PR will automatically build artifacts, push them to a registry, and deploy to the cluster (dev-a). It might look something like this:

Afterwards, the developer removes the "take over" for the cluster (dev-a), and it goes back to the latest version from main.

With Kargo, introducing the ability to create freights manually will allow us to replicate a good portion of the workflow, but it would make it slightly inconvenient when pushing updates to the PR.

You can have two Warehouses with different image selection criteria. One of them feeds your "normal" pipeline and the other feeds your dedicated pre-release Stage.

I think with this, we could re-configure the pipeline every-time the "take over" operation happens, but the DAG on UI wouldn't be consistent and might make it difficult to locate a particular cluster. If we have around ~25 clusters, this can become a problem!

You can write more general image selectors that match your RCs and your normal versioning scheme as well and they will put both into the same freightline/pipeline

I think this works well if auto-promotion isn't enabled. With auto-promotion enabled, every auto-promoted stage will get the latest build, which may be a build from a PR.

I just wouldn't want to introduce complications like this when there are ways to approach this that already work.

Understandable, I think manual freight creation will at least provide a workable solution 👍

Brightside56 commented 2 months ago

Kargo's single freightline could be limiting since it forces all changes to happen as long as they are in the tip. In our git + image repository example, we use a single Warehouse that subscribes to both the git repo and the image repo, then permutates the combination

In ideal world there are two artifacts (templates and image) which make up single deployable unit of microservice or service, which can consist from multiple apps which are separate deployments, but based on same docker build (as, for example, ArgoCD does)

But often it's lalaland. Case I have (and which me and my colleagues are encountering quite often) is service has multiple components, and those components are in different repos and have separate build pipelines with separate resulting docker images, for example:

There are plenty of such cases, and building a polyglot monorepos is not an option here, because it introduces significant complexity and inconvenience which cannot be solved without build time increase, or... additional sophisticated tooling or... rock art using ugly scripts/glue code, or... security tradeoffs

But those poly-repo/poly-artifact components are still strongly treat as single unit, because should be tested and deployed together. Many engineers (like QA) don't/shouldn't really care if those services in different repos or in monorepo, order which those services should be deployed (because should be deployed together) or how those parts of something are related from point of delivery

Therefore I would prefer to have possibility to select 3-4-5 artifacts of service as freight (or freights in different freightlines) and promote to stage of choice (not necessarily production) - as single operation from user standpoint

they are in the tip

Freightline with 3-4-5 artifacts (where it can take time to write templates or build all needed artifacts, week or two) is often completely unusable and reminds slot machine in casino

krancour commented 1 month ago

Editing the labels on this because after going back through this, I believe there is only UI work here.

krancour commented 1 month ago

Actually, I'm going to close this because I believe #1208 succinctly captures the UI work that needs to be done for this.

jessesuen commented 3 weeks ago

Reopening as the v0.7 Freight Assembly feature doesn't fully address the use case as I thought it would.

krancour commented 3 weeks ago

While manual freight creation is a great feature, and I'm pleased we've got it done, @jessesuen is right that (probably due to some miscommunication), it doesn't solve the problems we thought it would.

We agreed I would add some more context to this thread to get it caught up to everything he and I have synced on.

To level set, as this was a point of confusion from the get-go, I have always viewed a freightline as just a stream of Freight coming from a Warehouse and not anything more than that. @jessesuen has seen it as the route that Freight from a Warehouse takes through the Stages. Now that we've come to understand this disconnect and have synchronized on the latter point of view, everything else has become clear.

If we conceptualize freighlines as train tracks, we agree we want the possibility of multiple parallel tracks going through the same set of Stages, with the trains on each of those tracks possibly traveling at different speeds, delivering different types of Freight (artifacts) to the Stages at differences cadences.💡 ❗

This drawing is a low fidelity visualization of the concept:

IMG_2304

One can imagine, for instance, images and base configurations traversing the blue line quickly, while something like a feature flag traverses the green line as a slower pace.

What this means is we need to update Stages with the potential to host multiple pieces of Freight. For lack of any better descriptors, a Stage can have n slots corresponding to n subscriptions (to Warehouses or upstream Stages) and Promotions update the Freight in one or more slots.

There's an ample amount of consequences to this change. We will have to make many adjustments to how Promotions work, how Freight are verified, etc., so this issue is set to become the overarching focus of our v0.8.0 release. 💥

krancour commented 3 weeks ago

We will have to make many adjustments to how Promotions work

I've put some thought into this today and wanted to capture some details before I risk forgetting everything over the weekend.

It is just the Git-based promotion mechanisms, and not the Argo CD-based ones, whose behavior is a bit "narrow" today and will need to be opened up a bit to make room for all the new possibilities that come along with the notion of multiple Freight per Stage. Specifically, I believe that if we do away with the assumption that we always read from and write to the same repo (see #1354), that is a good starting point. And the steps leading up to the write, could stand to be a bit more flexible and ordered. (Think Dockerfile directives.) I propose the following (as a mere starting point for further discussion):

  1. A Git repo update will name one repo (by URL) and branch (usually Stage-specific) that it would like to write to (same as today). We use a new, empty temp dir as a starting point for everything that follows. (This is different.)

  2. We execute an ordered series of directives, which can be:

    • Copy directives (see #1250): Each of which specifies a source (repo and a path; commit ID comes from the Freight(s) being promoted) and a destination path within the new, empty workspace.
    • Config management directives: Use Kustomize to update image refs (tags or digests come from the Freight(s) being promoted), update values.yaml files with new image refs, or update chart.yaml files with new chart refs (version numbers come from the Freight(s) being promoted).
  3. Optionally render using one of:

    • Kustomize (kustomize build)
    • Helm (helm template)
    • Kargo Render
  4. Write everything we have just assembled to the repo and branch specified back in no.1.

So something that formerly looked like this:

promotionMechanisms:
    gitRepoUpdates:
    - repoURL: https://github.com/example/repo.git
      writeBranch: stage/test
      kustomize:
        images:
        - image: nginx
          path: stages/test

Might now look like this:

promotionMechanisms:
    gitRepoUpdates:
    - repoURL: https://github.com/example/repo.git
      writeBranch: stage/test
      directives:
      - copy:
           repoURL: https://github.com/example/repo.git
           paths:
           - from: base
              to: base
           - from: stages/test
              to: stages/test
      - kustomize: # Run `kustomize edit set image nginx:<tag>` in stages/test
           image: nginx
           path: stages/test
      render:
        kustomize: # Run `kustomize build stages/test`
          path: stages/test

This is a bit more verbose than what we started with, but it's also quite a bit more flexible (easy to see how other directives could be incorporated into this) and it's also very plain to see exactly what it's doing -- arguably much moreso than with what we've currently got.

There is obviously more to think through here, of course.

krancour commented 2 weeks ago

Note on title change:

Because of past confusion over what a "freightline" is, we've decided we're striking the term from our vocabulary.

What was once labeled as "freightline" in the UI will be "freight timeline" going forward, which is an accurate descriptor that doesn't introduce any new nomenclature.

The path(s) that Freight of different types take through the graph of Stages is a "pipeline" -- also not a new term.

So this issue permits "multiple pipelines" to flow through each Stage.