diggerhq / digger

Digger is an open source IaC orchestration tool. Digger allows you to run IaC in your existing CI pipeline ⚡️
https://digger.dev
Apache License 2.0
2.88k stars 137 forks source link

Dependencies #791

Open ZIJ opened 10 months ago

ZIJ commented 10 months ago

Our dependencies / include patterns implementation only conciders "static" graph based on folder structure. This leads to following problems:

We probably want to:

Considerations for limiting to inputs / outputs

ZIJ commented 10 months ago

Additional feedback from Francois (NT): "all projects are triggered if modules change, even though changes in certain modules aren't affecting certain projects".

ZIJ commented 9 months ago

Idea for using git as a store for dependencies / TFVars

In digger.yml, each project can define exports and imports lists, like this:

projects:
  - name: projA
    dir: ./proj-a
    exports: ["vpcId", "subnetId"]
  - name: projB
    dir: ./proj-b
    imports: ["projA.vpcId", "projA.subnetId"]

On every successful apply of a project, all its exports are pushed to git as outputs.tfvars file, into either a separate "infra state" repo or a special folder like .digger. Before applying, Digger copies its outputs.tfvars from each upstread dependency as imports.projectA.tfvars (possibly also updates the keys with project prefix) so that Terraform picks it up automatically.

Folder structure matches digger.yml, like this:

.digger
   ├- ProjA
       ├- outputs.tfvars
   ├- ProjB
       ├- outputs.tfvars

This ways multi-stage applies for every change can be split in "layers" (effectively levels of the dependency tree):

Making imports and exports an explicit white-list is important because persisting all outputs might be potentially unsafe as users might output some sensitive data like passwords. We'd only want to export outputs that are used in downstream projects (like VPC IDs), which tend to be safe.

User may choose to not proceed with Layer 1 after Layer 0 has been applied, or only apply some of the projects by hand and not all.

Why store in git? Because every apply produces outputs changes the "overall state of infrastructure". Ideally everything is tracked in git for it to be single source of truth, rollbacks, audit trail etc. But terraform state itself cannot be stored in git because it might contain sensitive data. Outputs however can be stored; and then the history of applies is clear commit-by-commit, and it'd be easy to see what exactly changed and when. Additional git-tracked state of sorts.

motatoes commented 9 months ago

Good concept but I don't think outputs or inputs live in git, they are already stored by terraform in state of every project and can be fetched out and passed to state on demand, the state is independent of how many jobs or threads exist and when it comes to reading it it is also safe with concurrent reads. That's how terragrunt does input/output management, and I think we should follow a similar route if we were to build our own dependency management between modules.

We may wish to cache these dependencies if we are scaling with jobs however and for that we will need either a cloud based cache which can be part of the orchestrator, or some other secret store to rely on caching these values.

ZIJ commented 7 months ago

More user feedback (M. U. 13.02.2024):

is there a way to handle dependant projects where plans for the second project wait until applies are done on the first project? i just (successfully) tried out project dependencies for the first time in that sample PR I provided previously and it seems like plans run right after each other, when I would think (but I might be wrong...) for most cases the second project's plan depends on resources/outputs from the first project's apply...

what I want to do (going off your example config): _1. have digger run a plan on core, wait for apply request if there are expected changes on core

  1. have me, as user, run digger apply to apply changes on core
  2. once successful, have digger run a plan on platform, now that core is up to date
  3. have me run digger apply to apply changes on platform merge_

_if two projects are sharing data (e.g. in the above PR via remotestate), the current digger behaviour would result the plan for platform failing when it tries to reference a value that doesn't yet exist (because so far it's only appeared in a plan on core and is not actually in the state for core yet)

in a sense it kind of is supported but in a hacky way. you'd just let the second project plan's job fail (or cancel it ahead of time knowing it'll fail) and run digger apply -p core, then a manual digger plan again and then finally digger apply (and probably also limit these to -p platform while we're at it)