Open ZIJ opened 1 year ago
Additional feedback from Francois (NT): "all projects are triggered if modules change, even though changes in certain modules aren't affecting certain projects".
In digger.yml, each project can define exports
and imports
lists, like this:
projects:
- name: projA
dir: ./proj-a
exports: ["vpcId", "subnetId"]
- name: projB
dir: ./proj-b
imports: ["projA.vpcId", "projA.subnetId"]
On every successful apply of a project, all its exports are pushed to git as outputs.tfvars
file, into either a separate "infra state" repo or a special folder like .digger
. Before applying, Digger copies its outputs.tfvars
from each upstread dependency as imports.projectA.tfvars
(possibly also updates the keys with project prefix) so that Terraform picks it up automatically.
Folder structure matches digger.yml, like this:
.digger
├- ProjA
├- outputs.tfvars
├- ProjB
├- outputs.tfvars
This ways multi-stage applies for every change can be split in "layers" (effectively levels of the dependency tree):
Making imports
and exports
an explicit white-list is important because persisting all outputs might be potentially unsafe as users might output some sensitive data like passwords. We'd only want to export outputs that are used in downstream projects (like VPC IDs), which tend to be safe.
User may choose to not proceed with Layer 1 after Layer 0 has been applied, or only apply some of the projects by hand and not all.
Why store in git? Because every apply produces outputs changes the "overall state of infrastructure". Ideally everything is tracked in git for it to be single source of truth, rollbacks, audit trail etc. But terraform state itself cannot be stored in git because it might contain sensitive data. Outputs however can be stored; and then the history of applies is clear commit-by-commit, and it'd be easy to see what exactly changed and when. Additional git-tracked state of sorts.
Good concept but I don't think outputs or inputs live in git, they are already stored by terraform in state of every project and can be fetched out and passed to state on demand, the state is independent of how many jobs or threads exist and when it comes to reading it it is also safe with concurrent reads. That's how terragrunt does input/output management, and I think we should follow a similar route if we were to build our own dependency management between modules.
We may wish to cache these dependencies if we are scaling with jobs however and for that we will need either a cloud based cache which can be part of the orchestrator, or some other secret store to rely on caching these values.
More user feedback (M. U. 13.02.2024):
is there a way to handle dependant projects where plans for the second project wait until applies are done on the first project? i just (successfully) tried out project dependencies for the first time in that sample PR I provided previously and it seems like plans run right after each other, when I would think (but I might be wrong...) for most cases the second project's plan depends on resources/outputs from the first project's apply...
what I want to do (going off your example config): _1. have digger run a plan on core, wait for apply request if there are expected changes on core
_if two projects are sharing data (e.g. in the above PR via remotestate), the current digger behaviour would result the plan for platform failing when it tries to reference a value that doesn't yet exist (because so far it's only appeared in a plan on core and is not actually in the state for core yet)
in a sense it kind of is supported but in a hacky way. you'd just let the second project plan's job fail (or cancel it ahead of time knowing it'll fail) and run digger apply -p core, then a manual digger plan again and then finally digger apply (and probably also limit these to -p platform while we're at it)
Our dependencies / include patterns implementation only conciders "static" graph based on folder structure. This leads to following problems:
We probably want to:
Considerations for limiting to inputs / outputs