Dependencies - Githubissues

ZIJ commented 1 year ago

Our dependencies / include patterns implementation only conciders "static" graph based on folder structure. This leads to following problems:

downstream dependencies triggering when there is no actual change affecting them. Noise and extra Action minutes spent.
duplicate effort in case of module imports (need to specify dependencies in 2 places)

We probably want to:

calculate dependencies on the actual import graph (or terragrunt graph)
optionally limit it to inputs - so that even if dependency changes, downstream isn't triggered unless its outputs change
further optionally, limit to specific outputs (as in the description of the issue

Considerations for limiting to inputs / outputs

Only trigger downstream dependent projects if certain outputs change in the upstream (not all). Similar to stack dependencies in Spacelift. From user feedback 16.11.2023 (Alexander F)
First step is likely switching the way dependencies work from simple graph of projects to a graph of outputs, so that downstream projects are triggered based on changes in outputs of their dependencies (all outputs)
Second step is to limit scope to only certain outputs (as the user suggests).

ZIJ commented 11 months ago

Additional feedback from Francois (NT): "all projects are triggered if modules change, even though changes in certain modules aren't affecting certain projects".

ZIJ commented 11 months ago

Idea for using git as a store for dependencies / TFVars

In digger.yml, each project can define exports and imports lists, like this:

projects:
  - name: projA
    dir: ./proj-a
    exports: ["vpcId", "subnetId"]
  - name: projB
    dir: ./proj-b
    imports: ["projA.vpcId", "projA.subnetId"]

On every successful apply of a project, all its exports are pushed to git as outputs.tfvars file, into either a separate "infra state" repo or a special folder like .digger. Before applying, Digger copies its outputs.tfvars from each upstread dependency as imports.projectA.tfvars (possibly also updates the keys with project prefix) so that Terraform picks it up automatically.

Folder structure matches digger.yml, like this:

.digger
   ├- ProjA
       ├- outputs.tfvars
   ├- ProjB
       ├- outputs.tfvars

This ways multi-stage applies for every change can be split in "layers" (effectively levels of the dependency tree):

Layer 0: A code change triggers an apply of 1 or more porjects. They run in parallel and each produce outputs stored in git
Layer 1: All projects that consume outputs of layer 0 as inputs
...

Making imports and exports an explicit white-list is important because persisting all outputs might be potentially unsafe as users might output some sensitive data like passwords. We'd only want to export outputs that are used in downstream projects (like VPC IDs), which tend to be safe.

User may choose to not proceed with Layer 1 after Layer 0 has been applied, or only apply some of the projects by hand and not all.

Why store in git? Because every apply produces outputs changes the "overall state of infrastructure". Ideally everything is tracked in git for it to be single source of truth, rollbacks, audit trail etc. But terraform state itself cannot be stored in git because it might contain sensitive data. Outputs however can be stored; and then the history of applies is clear commit-by-commit, and it'd be easy to see what exactly changed and when. Additional git-tracked state of sorts.

motatoes commented 11 months ago

Good concept but I don't think outputs or inputs live in git, they are already stored by terraform in state of every project and can be fetched out and passed to state on demand, the state is independent of how many jobs or threads exist and when it comes to reading it it is also safe with concurrent reads. That's how terragrunt does input/output management, and I think we should follow a similar route if we were to build our own dependency management between modules.

We may wish to cache these dependencies if we are scaling with jobs however and for that we will need either a cloud based cache which can be part of the orchestrator, or some other secret store to rely on caching these values.

ZIJ commented 9 months ago

More user feedback (M. U. 13.02.2024):

is there a way to handle dependant projects where plans for the second project wait until applies are done on the first project? i just (successfully) tried out project dependencies for the first time in that sample PR I provided previously and it seems like plans run right after each other, when I would think (but I might be wrong...) for most cases the second project's plan depends on resources/outputs from the first project's apply...

what I want to do (going off your example config): _1. have digger run a plan on core, wait for apply request if there are expected changes on core

have me, as user, run digger apply to apply changes on core
once successful, have digger run a plan on platform, now that core is up to date
have me run digger apply to apply changes on platform merge_

_if two projects are sharing data (e.g. in the above PR via remotestate), the current digger behaviour would result the plan for platform failing when it tries to reference a value that doesn't yet exist (because so far it's only appeared in a plan on core and is not actually in the state for core yet)

in a sense it kind of is supported but in a hacky way. you'd just let the second project plan's job fail (or cancel it ahead of time knowing it'll fail) and run digger apply -p core, then a manual digger plan again and then finally digger apply (and probably also limit these to -p platform while we're at it)

diggerhq / digger

Dependencies #791

Idea for using git as a store for dependencies / TFVars