Proposal: Machine Declaration

ehazlett commented 9 years ago

We currently use store to persist machine information (IP, credentials, name, etc). This works mostly but has some flaws. First, if the infrastructure is modified (machine name, size, IP, etc) or removed then we drift. The store almost has to be assumed in drift as soon as we create a machine. We also do not update the store with the machine info either.

This proposal would bring together a few concepts that have been brought up before: a machine configuration file and discovery based machine information. This would also have a docker-compose like feel as well. It pulls some ideas from Terraform. Here is the idea:

Machines are defined in a configuration file (i.e. docker-machine.yml)
- Implementation detail: we would still allow (machine create -d ec2 ...) but simply update / create the config
This configuration file would store credentials (or use env vars, etc) and defaults for that specific group of machines
- This would allow for multiple environments potentially spanning the same machines (dev, staging, etc)
A new command (i.e. apply) would allow machine to apply the configuration (launching any instances needed, etc)
- For example, if the configuration states 5 t1.micro instances in ec2 and there are only 4 a new one is created

I think this would also play well with the idea of Machine Server. It would be nice to see Machine Server create new node(s) if they crash etc. I think a great integration would be where a Swarm node dies and Machine Server automatically launches a new instance and adds it back to the cluster.

Questions / Thoughts

What should the configuration look like?
We would need an identification mechanism for the instances (labeling, etc)

Huge thanks to @gabrtv for the discussion and idea :)

ehazlett commented 9 years ago

/cc @sthulb @nathanleclaire @bfirsh

nathanleclaire commented 9 years ago

I'm really in favor of this. "Read in a text file, spit out the system in the desired configuration" is a good goal.

However, I think we need to carefully design this before we start implementing. Some concerns I can think of off the top of my head:

I create a machine based on my file, then go change some settings remotely. Does machine detect that and update the file? Or simply converge the system back to the original state from the file?
I'm pretty sure that we will need at least two separate files, like how Terraform has tfvars, so that one can be easily kept out of version control.
Something we need to think about is, we probably don't want everyone to have lots of per-project VMs sprouting up
How this would tie in with config if we put that in (docker-machine config set Drivers.DigitalOcean.Image docker to globally set the default DO image, for instance)? The file is just the highest priority I suppose? Or if you do an apply it chucks all the other stuff (env vars, config) out the window and always starts from a clean slate?
Stuff like re-creating failed hosts starts to get into the turf of tools like Mesos. Where are the boundaries and how do we avoid duplicating effort / reinventing the wheel?
I like Terraform's "plan-before-you-apply" model. Is that included in scope?

Like I said, I'm in favor, let's consider carefully before implementing though.

ehazlett commented 9 years ago

Absolutely. As titled "proposal" this is for discussion :)

I create a machine based on my file, then go change some settings remotely. Does machine detect that and update the file? Or simply converge the system back to the original state from the file?

This is what we need to discuss. I can see pros and cons for both ways.

I'm pretty sure that we will need at least two separate files, like how Terraform has tfvars, so that one can be easily kept out of version control.

I am leaning this way too.

Something we need to think about is, we probably don't want everyone to have lots of per-project VMs sprouting up

I don't think we should impose that. Who are we to decide how people use it? For example, there is nothing restricting compose users from spinning up lots of containers. I think we should make people very aware of what it is doing without imposing restrictions or decisions on how they design their infrastructure.

I could see a workflow similar to this:

Ops uses docker-machine.yml to configure a staging environment
Dev uses docker-compose.yml to build their stacks on the environment

At this point, the environment is simply a service that dev consumes and ops can ensure how it is ran. I don't necessarily think there would be a docker-machine.yml per project.

How this would tie in with config if we put that in (docker-machine config set Drivers.DigitalOcean.Image docker to globally set the default DO image, for instance)? The file is just the highest priority I suppose? Or if you do an apply it chucks all the other stuff (env vars, config) out the window and always starts from a clean slate?

I am still not convinced of the "global config file". I can see the advantage of having a single place for all of the things but like git, most of us use localized settings and I'm not sure about having several configs in place. I like simplicity and the definition of a "staging" environment that has all of it's definition is very appealing. What I didn't like about config mgmt. systems was all of the inheritance spread throughout.

Stuff like re-creating failed hosts starts to get into the turf of tools like Mesos. Where are the boundaries and how do we avoid duplicating effort / reinventing the wheel?

I don't think so. Ensuring a host is up or action is taken upon fail would be a huge benefit for machine server and could actually work in tandem with projects like Mesos. For example, instead of some config mgmt tool or vendor provided, you could use machine server to ensure 5 nodes are always up. Mesos would then ensure what containers are supposed to be on those are there.

I like Terraform's "plan-before-you-apply" model. Is that included in scope?

Absolutely :)

hairyhenderson commented 9 years ago

+1

This makes a lot of sense - it would be nice in a UI sense for docker-machine and docker-compose to have more parallels. For people just starting to use and try to understand the Docker tools, it would probably be a huge help to have this kind of parity.

A few random thoughts:

Why apply and not up? docker-machine up feels more natural if taking cues from docker-compose
Having some way to read separate files for secret things definitely makes sense, I think referencing external files makes the most sense (something like: external_config: my_secret_file), so that docker-machine doesn't have to be told about multiple different files, and so that I can share files that other systems might use or control
It would be useful to be able to set the driver on a per-host basis (e.g. I want 10 hosts in rackspace, 10 hosts in ec2, and 10 hosts in softlayer)
It would also be useful to be able to set a driver as default for all hosts defined in the file
In the case where there's an existing set of hosts and I change something in the .yml (like swap out a t1.micro for a t1.small), I think either I should have to add a --yes-i-really-want-to-destroy-a-vm-and-bring-up-a-new-one flag, or there should be a separate command altogether.
Should there be some extra metadata telling me which config a host listed in docker-machine ls came from?
It'd be neat if I could set a region attribute to an array of different regions, and machine would spread my instances across each region (i.e. I set instances: 9 and softlayer-region: [ tor01, dal05, sjc01 ], and end up with 3 hosts in each)
Hosts should be brought up in parallel as much as possible, except that swarm nodes should only come up after their master is available

I'm thinking a file might look like this:

# docker-machine.yml
osswarmmaster:
  driver: openstack
  openstack-flavor-name: tiny
  openstack-image-name: Ubuntu 14.04 LTS
  openstack-floatingip-pool: myfloatingips
  swarm-master: true
  swarm-discovery: token://1234
myawesomevm:
  driver: openstack
  openstack-flavor-name: large
  openstack-image-name: Ubuntu 14.04 LTS
  openstack-floatingip-pool: myfloatingips
  instances: 4
  swarm-discovery: token://1234
slbigbox:
  external_file: softlayer-secrets.yml
  driver: softlayer
  softlayer-cpu: 4
  softlayer-disk-size: 100
  softlayer-memory: 8192
  softlayer-region: [ tor01, dal05, sjc01 ]
  instances: 15

# softlayer-secrets.yml
softlayer-user: fred
softlayer-api-key: 1234-5678-9012

The hosts resulting from this could then be named something like:

osswarmmaster
myawesomevm_1, myawesomevm_2, etc...
slbigbox_1, slbigbox_2, etc...

ehazlett commented 9 years ago

@hairyhenderson great feedback! thanks!

Why apply and not up? docker-machine up feels more natural if taking cues from docker-compose

I'm not set on the command names -- I think apply makes sense if we make it declarative as in if there are 6 instances with the identifying tag and we remove one to match the definition. However, if we just operate like compose does (it will ignore additional containers i believe) then up would make sense too.

It would also be useful to be able to set a driver as default for all hosts defined in the file

+1

In the case where there's an existing set of hosts and I change something in the .yml (like swap out a t1.micro for a t1.small), I think either I should have to add a --yes-i-really-want-to-destroy-a-vm-and-bring-up-a-new-one flag, or there should be a separate command altogether.

Yeah I'm not sure how we would handle this. Perhaps the driver would have to support a "Modify" operation that would do some rolling modification. In the case of EC2, it would simply stop the instance and change the type (assuming we use EBS which we currently do). However, not all drivers support this so we would have to figure those out. We could also take a cue from Terraform and support in-place modifications for certain operations or a create/destroy routine for those that don't.

Should there be some extra metadata telling me which config a host listed in docker-machine ls came from? I'm leaning towards machine only using a single config to show what that environment looks like. We could also have a --config option or similar to specify certain ones (like compose).

It'd be neat if I could set a region attribute to an array of different regions, and machine would spread my instances across each region (i.e. I set instances: 9 and softlayer-region: [ tor01, dal05, sjc01 ], and end up with 3 hosts in each) Absolutely!!

Hosts should be brought up in parallel as much as possible, except that swarm nodes should only come up after their master is available +1. Actually, swarm nodes do not need their master to be available -- you can start them all together and when the master is up, it will query the discovery service for what nodes are members.

thaJeztah commented 9 years ago

I'm pretty sure that we will need at least two separate files, like how Terraform has tfvars, so that one can be easily kept out of version control.

+1. In future this could be extended to allow other storage mechanisms than a file.

FYI; this proposal in docker-compose is leaning toward having two separate files as well: https://github.com/docker/compose/issues/846 (a "definition" and a "configuration" file)

sthulb commented 9 years ago

I like the concept.

Areas of interest

What happens if a user updates their file, i.e. update the size of the VM. Do we perform a migration?
How do we handle this in a client/server model? Do we store these files on the server and sync them back to the user?
How do we handle failure?

There's probably a few more behaviour issues to work out.

ehazlett commented 9 years ago

@thaJeztah cool thx!

sthulb commented 9 years ago

@ehazlett Can we make this actually support compose syntax? So people can get swarms/machines up running containers?

ehazlett commented 9 years ago

@sthulb i would love to see that :) I think it would also be a good integration with compose as well.

errordeveloper commented 9 years ago

:+1:

ghost commented 9 years ago

Something I would be concerned about with a declarative file is the handling of sensitive information. Presumably someone will want or need to check their config into source control and I've seen too many horror stories of people being charged hundreds of dollars due to bots constantly scanning GitHub and other sites for keys. Possible solutions include taking the key from environmental variable, being prompted for the key, have the key in an encrypted file (e.g. Ansible Vault) and be prompted for the password or taking it from environmental variable to unlock it.

hairyhenderson commented 9 years ago

@UserTaken - very good point. If we take a cue from docker-compose.yml, then we could use an env_file property, which enables users to keep secrets in files but out of source control. Obviously, that adds an extra step in CI builds since secrets need to be written to temporary files, then deleted from those files.

thaJeztah commented 9 years ago

Handling of secrets still is a hot topic. If an env-file is supported, tools such as HashiCorp Vault, Keywhiz or Sneaker could be useful.

Also, I requested the Docker security maintainers to write up their thoughts / recommendations here; https://github.com/docker/docker/issues/13490

nathanleclaire commented 9 years ago

To be clear, in terms of secrets such as API tokens which might be needed in such a docker-machine.yml file, I would like to support either inheriting them from the environment or keeping them in some other secondary "var" file which is deliberately meant to be kept out of version control. Either way, we should actively discourage having them in whichever file is meant explicitly to be checked into version control.

kacole2 commented 9 years ago

bringing this back from the dead (July 10th was last response)

i really like this concept:

Ops uses docker-machine.yml to configure a staging environment Dev uses docker-compose.yml to build their stacks on the environment

i would also like to see a number somewhere in these descriptor files. Like I want 10 of type X and 5 of type Y. The reason being is that the underlying infrastructure may need to be tailored the apps, networking, or storage access. As @ehazlett said before, let's not limit what a user wants to do.

I hope that my PR #1881 shows a working concept of using additional configuration options. Would like to see that functionality added down the road.

nathanleclaire commented 9 years ago

@kacole2 I agree that something like count: 5 would be useful and have made moves to support it in the past with flags like --n-instances (never successfully merged), so I'd like to add something of the sort if we implement functionality like this.

Likewise, hopefully the new driver plugin model will also open the doors to extensible functionality in other areas.

kacole2 commented 9 years ago

@nathanleclaire where can i learn more about the plugin model? Assuming #1626? would like to contribute where possible to make it a reality.

nathanleclaire commented 9 years ago

@kacole2 Yep, that's the proposal, and https://github.com/docker/machine/pull/1902 is the PR

krasi-georgiev commented 8 years ago

is this on hold ?

nathanleclaire commented 8 years ago

@vipconsult Sort of. @kunalkushwaha has a POC here: https://github.com/docker/machine/pull/2422 and we're talking about possibly trying to implement it for 0.6.0 (January), but we can't make any promises -- it's a very big thing to commit to implementing such a feature, and we would need to get feedback from a variety of other Docker teams (for instance, is this encroaching on Compose territory?) and users before making moves.

schmunk42 commented 8 years ago

I'd really like to see this. Why don't you split this into a separate project like docker/docker-compose?

For reference: https://github.com/efrecon/machinery

krasi-georgiev commented 8 years ago

good idea

kunalkushwaha commented 8 years ago

I think a separate project, will be more of wrapper around docker-machine and compose. I think best way to add few features in libcompose like https://github.com/docker/libcompose/issues/157 and integrate them with machine could be better.

joelhandwell commented 7 years ago

If implementing this feature in a different project make sense, can implementing it as a terraform plugin (either docker_machine resource and/or docker_machine provisioner) or adopting HCL as configuration language for docker-machine be considered ? I think re-implementing whole terraform for docker-machine is too wide scope. And if we go with plain YAML, people will start to complain the lack of string interpolation which is already implemented as HCL. Let's say AWS launched new EC2 feature and terraform development team and docker-machine development team work on adopting same feature. This is nothing but duplicated effort and wasting open source development human resource. If docker-machine project members are freed from cloning a portion of terraform or HCL, they can focus on swarm compatibility or swarm integration which are the things of docker. AWS adds around 1000 new features per year and the speed is accelerating every year. terraform community so far is catching up this pace by adopting those features as soon as it released. With current development activity, we might need to consider catching up this speed can be realistic goal for docker-machine community.

Code of launching 16 docker swarm nodes as EC2 instances can be like this: docker_host.hcl

resource "aws_instance" "docker_host" {
  count = 16
  ami = "${data.aws_ami.docker_host.id}"
  instance_type = "t2.medium"
  vpc_security_group_ids = [
    "${aws_security_group.docker_host.id}"
  ]
}

resource "docker_machine" "docker_host" {
  count = 16
  swarm = true
  swarm-master = "${count.index < 3 ? true : false}" 
  aws_instance_id = "${aws_instance.docker_host.*.id}"
  ssh_key = "${var.docker_host.ssh_key}"
}

And command can be terraform apply or docker-machine create --config dockerhost.hcl

I hope unix philosophy is also applicable for docker-machine by doing docker thing, and do it well.

hairyhenderson commented 7 years ago

@joelhandwell I totally agree - would love to see this sort of thing. There's some definite crossover with projects like Docker for AWS/Azure/GCP, and Infrakit (see especially https://github.com/docker/infrakit/tree/master/examples/instance/terraform).

To be honest, one of the reasons that I haven't spent much time with Docker Machine lately is because Docker for AWS meets my needs much better. I've been using Terraform to apply the D4AWS CloudFormation template mostly. ¯\_(ツ)_/¯

docker / machine

Proposal: Machine Declaration #773

Areas of interest