cloudtools / stacker

An AWS CloudFormation Stack orchestrator/manager.
http://stacker.readthedocs.io/en/stable/
BSD 2-Clause "Simplified" License
711 stars 167 forks source link

Persistent graph #749

Closed ITProKyle closed 4 years ago

ITProKyle commented 4 years ago

Using a persistent graph enables a Terraform-like experience. When a stack is removed from the local config file, it will be removed from AWS. This will occur for both build and destroy actions. The graph and diff commands also take the persistent graph into account (if used).

This feature requires users opt-in to prevent any unexpected results in current environments. This is done by adding a the persistent_graph_key directive to any config file. The user is responsible for ensuring that this value is unique per stacker bucket + namespace.

The graph is persisted in the existing stacker bucket (meaning it requires use of a stacker bucket). No additional resources are required to ensure a low cost and minimally intrusive solution. It is recommended to enable versioning for the bucket to rollback any unintended changes to the object but it is not required and is not actively used by Stacker. The user is warned each use when it is not enabled and if Stacker created the bucket when the feature is opted-in, it creates the bucket with versioning enabled.

The persistent graph object is locked using tags on the S3 object (instead of a DynamoDB table like Terraform) to reduce cost and requirements. When the graph is locked, no other action that acts on the persistent graph (build/destroy) can be started.

What Changed

phobologic commented 4 years ago

Hey @ITProKyle - thanks for this code, but I will say that I'm pretty nervous including support for this. Even with it being opt in, as soon as it's in the code base it's something we'd have to support - and this has the potential to lead to all the nasty cases that we've avoided by maintaining our own state and letting AWS/Cloudformation manage it (something I've seen as a strength of stacker). For example this is going to incur some support cost when people's state files get in weird states, something that Terraform had to deal with for some time (and I'm pretty sure still does).

I'm not 100% against this, and I really appreciate the effort, but I think I'd really have to be convinced that we should move ahead with pulling this into the codebase. @GarisonLotus @troyready @ejholmes I'd be curious about your thoughts.

troyready commented 4 years ago

... this has the potential to lead to all the nasty cases ... ... this is going to incur some support cost when people's state files get in weird states, something that Terraform had to deal with for some time (and I'm pretty sure still does).

I think you hit the downsides correctly. However, I'm biased in favor of this because:

ITProKyle commented 4 years ago

avoided by maintaining our own state and letting AWS/Cloudformation manage it (something I've seen as a strength of stacker). For example this is going to incur some support cost when people's state files get in weird states

The design uses a very minimal/loose state to try to avoid most cases where the state could become unhealthy. It is not a replacement for CloudFormation and does not attempt to replace anything it does. It only tracks what stacks should exist at any given time (but even this is loose) by extending the current functionality of the graph.

If the state object saved to S3 were ever deleted, the only adverse effect would be the potential for "orphaned" stacks that won't be removed when they are removed from the stacker config (same as current the functionality).

If the state object is removed mid-execution for some reason, the current execution fails (except when the state object is expected to be empty/removed) when it tries to update the state object but, running it again will recreate the state object.

If a stack exists in the state object but not in AWS, it will just be removed from the state object as if it were destroyed.

It does it's best to prevent updates from two sources at the same time. The only limiting factor being eventual consistency. But, that is also backed by the state of the CloudFormation stacks.

The two areas when the state object could become unhealthy is if it were to not be unlocked at the end of execution for some reason (just need to remove a tag to fix) or if the user has use the same key and namespace for two stacker files (change one of the two values and delete the state object to fix).

Im open to any suggestions to mitigate risk and potential support cases.

ejholmes commented 4 years ago

I'm on the fence about bringing persistent graphs into stacker. One one side, I think this would have a lot of benefits:

  1. Way faster execution, since stacker wouldn't need to hit CloudFormation to check if a change is required, which gets rate limited pretty heavily for large graphs.
  2. As mentioned in the PR description, an easier ability to automate stack deletions.

On the other side, I'd be wary of bringing in the maintenance costs that this would likely bring. I feel like stacker's feature set is pretty stable right now, and I'm usually in favor of keeping things simple.

Maybe we could bring this in as an experimental feature, with the warning that it could be removed in future releases?

ITProKyle commented 4 years ago

@phobologic - Unfortunately, the company I work for has deprioritized Stacker and I don't anticipate time being designated to it for the foreseeable future.

Feel free to make use of anything in the PR, make changes as you see fit (I created a copy of this branch for my use in case I need to reference it for whatever reason), or close it.

phobologic commented 4 years ago

Sounds good - thanks @ITProKyle!