NixOS / nixops

NixOps is a tool for deploying to NixOS machines in a network or cloud.
https://nixos.org/nixops
GNU Lesser General Public License v3.0
1.84k stars 363 forks source link

Idea: Deployment plans to converge states #316

Open trentonstrong opened 9 years ago

trentonstrong commented 9 years ago

While working on adding Amazon RDS resource support to nixops I couldn't help but note that the implementation was a little bit tricky, mostly due to state management. There are also quite few different "approaches" amongst the different backends and resources implemented, making it difficult to figure out the right balance between handling state differences for the user or forcing them to rectify out-of-channel. Nixops is overall great, by the way, and I am happy it exists.

Problem Description

Most of the complexity in adding a new type of resource is in the create function and concerned with possible differences between our local state and the real state, and what to do in the myriad ways they can diverge.

Take the SQS queue resource, which is relatively simple compared to something like the ec2 backend. The majority of the create logic is contained in these lines:

if self.state == self.UP and (self.queue_name != defn.queue_name or self.region != defn.region):
            self.log("queue definition changed, recreating...")
            self._destroy()
            self._conn = None # necessary if region changed

        if check or self.state != self.UP:

            self.region = defn.region
            self.connect()

            q = self._conn.lookup(defn.queue_name)

            if not q or self.state != self.UP:
                if q:
                    # SQS requires us to wait for 60 seconds to
                    # recreate a queue.
                    self.log("deleting queue ‘{0}’ (and waiting 60 seconds)...".format(defn.queue_name))
                    self._conn.delete_queue(q)
                    time.sleep(61)
                self.log("creating SQS queue ‘{0}’...".format(defn.queue_name))
                q = nixops.ec2_utils.retry(lambda: self._conn.create_queue(defn.queue_name, defn.visibility_timeout), error_codes = ['AWS.SimpleQueueService.QueueDeletedRecently'])

            with self.depl._db:
                self.state = self.UP
                self.queue_name = defn.queue_name
                self.url = q.url
                self.arn = q.get_attributes()['QueueArn']

While not particularly long or impossible to understand, I would argue that even for this simple example it takes a bit to wrap your head around the state logic and prove to yourself there aren't any serious logic errors.

A couple other issues touched on similar issues: #123 and #250.

I think I understand the motivations for having a local state file, and don't argue against it. The point I would like to make is between essential and incidental complexity. The fact that our local state and the state of the world can diverge is an essential complexity of declarative configuration tools. The fact that we implement the logic to converge those states imperatively seems like incidental complexity to me.

One Possible Approach: Deployment Plans

Perhaps one could take inspiration from other declarative languages by keeping the What distinctly separate from the How by introducing some re-usable abstractions for comparing states and generating "plans" for how to converge to the desired state, if possible.

This could help standardize code and the way the ResourceState classes move through their states and when they conflict with options such as allow_reboot,allow_recreate and so forth.

This is a vague idea at the moment, but generating concrete plans has some added benefits, such as:

It might be worth taking a look at a tool like Terraform (https://www.terraform.io/) that

Hopefully this doesn't come off as excess criticism or unsolicited advice, I just wanted to share my thoughts on the development experience while they were still fresh.

rbvermaa commented 9 years ago

No worries about giving unsollicited advice, your remarks are very welcome. @Phreedom or @aszlig have suggested similar changes before if I remember correctly.

I have always liked the feature of terraform, that shows exactly what steps it will perform. I would love to see this feature in nixops as well. Will ponder a bit about this feature during my holiday, to think through what issues we would run into.

danbst commented 9 years ago

nixos will benefit too from having "switch plans", for example, when I change fs type for a partition. The particular plan must be infered or stated explicitly in configuration.nix.

moretea commented 7 years ago

I would really like to see the following workflow in nixops

# Manually generate a plan from the current state
nixops plan --output-plan ./plan.nix

# Print the operations in the plan
nixops explain-plan --plan ./plan.nix

# Apply the plan
nixops apply --plan ./plan.nix

The nixops deploy command will run the plan and apply steps after each other automatically.