coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
147 stars 30 forks source link

Configuration Management / Orchestration Support for CoreOS (a la SaltStack) #1859

Open tgelter opened 7 years ago

tgelter commented 7 years ago

Issue Report

Feature Request

Configuration management / Orchestration Support for CoreOS, including or similar to SaltStack

Environment

What hardware/cloud provider/hypervisor is being used to run Container Linux? AWS/Azure/Datacenter(various models)

Desired Feature

After chatting with some core contributors on #coreos(freenode), they suggested that I put in an issue so that this feature request could get more organized attention.

Other Information

Chat transcript below with additional use-case info:

[11:54] Hello everyone. I'm looking for some recommendations around config management for CoreOS. Since it appears that running an agent (e.g. puppet/salt) isn't a good idea, am I basically left with ansible & other ssh-based config management & orchestration tools, or is there something else you'd suggest? [12:02] Hoban: if you can specify everything during initial setup, then Ignition is pretty great. What are you looking to do post-install? [12:04] @thatmightbepaul, we actually have a pretty elegant ignition-based deployment for multi-cloud (including datacenter) right now, so initial config is in great shape. What we're looking for is something for small tweaks that don't warrant complete re-installs. [12:04] thatmightbepaul: for example, password rotation, or an LDAP config update [12:04] Gotcha, I know I've seen others run Ansible from a container (since Python isn't on Container Linux by default) [12:05] But I also know the one link on our site about that is a very out of date blog post [12:05] let me ask around and see if anyone in office has any recs [12:05] thatmightbepaul: that would be excellent, thanks! [12:12] Hoban: the most recent example of an ansible install that I've found is https://github.com/defunctzombie/ansible-coreos-bootstrap [12:13] I think ansible is probably easiest (since Puppet and Chef do some server/client stuff that adds complexity, afaik) [12:14] thatmightbepaul: OK, so I'm looking third-party then, which is OK. I was hoping the project had something either native, or else 1st-party so we're not doing our own thing [12:14] thatmightbepaul: maybe most people are just running CoreOS in the cloud only, and spin down/up instances, rather than pushing config changes via some other mechanism? [12:15] thatmightbepaul: honestly, I'd prefer to avoid config management entirely, keeping the nodes "immutable", but as we're co-hosting persistent data volumes on the hosts as well, it sometimes makes sense not to require data rebuilds for small changes. [12:16] Yeah, AFAIK we have more than a few issues on GH of folks using some kind of config management [12:17] <@robszumski> hoban: i think fabric is another popular ssh-based tool [12:18] thatmightbepaul, thanks! [12:18] robszumski: I'll check that one out too, thanks! [12:18] but the majority get by running w/o. It's a good use case to capture, imo. If you have a min, I'd add a feature request to the forum to spur more thought on us doing it first-party: https://github.com/coreos/bugs/issues [12:18] sure thing!

crawford commented 7 years ago

Thanks for the report. This is definitely something we are working through. With the introduction of Ignition, we knew we had provisioning covered but would eventually have to tackle configuration management. That's the next phase ;)

I don't have any progress to share right now, but I'm hoping we can begin to make progress in the coming months.

devx commented 7 years ago

@crawford I was wondering if you had made progress on this front?

gcmalloc commented 7 years ago

A possible approach is to use a docker container and bind most the host filesystem in the docker, such as https://epfl-sti/cluster.coreos.puppet.

The command looks like:

/usr/bin/docker run \
  --name puppet \
  --net=host \
  --privileged \
  -v /:/opt/root \
  -v /dev:/dev \
  -v /etc/systemd:/etc/systemd \
  -v /etc/ssh:/etc/ssh \
  -v /etc/puppet:/etc/puppet \
  -v /var/lib/puppet:/var/lib/puppet \
  -v /var/run:/var/run \
  -v /home/core:/home/core \
  -v /etc/os-release:/etc/os-release:ro \
  -v /etc/lsb-release:/etc/lsb-release:ro \
  -v /etc/coreos:/etc/coreos:rw \
  -v /run:/run:ro \
  -v /usr/bin/systemctl:/usr/bin/systemctl:ro \
  -v /usr/bin/fleetctl:/usr/bin/fleetctl:ro \
  -v /lib64:/lib64:ro \
  -v /lib/modules:/lib/modules:ro \
  -v /usr/lib64/systemd:/usr/lib64/systemd \
  -v /usr/lib/systemd:/usr/lib/systemd \
  -v /sys/fs/cgroup:/sys/fs/cgroup:ro
tgelter commented 7 years ago

Running a container with that much access to the host filesystem seems like a poor design decision to me. Unless there is a CoreOS-integrated solution provided, I'll be opting to bootstrap python (via ansible) to the node, run ansible playbook(s), and then remove python (again, via ansible). I'm not personally comfortable with the security implications of giving any container that much access to the container host.

crawford commented 7 years ago

I'm not personally comfortable with the security implications of giving any container that much access to the container host.

The alternative is to run the program directly on the host. What is the effective difference?

tgelter commented 7 years ago

@crawford, the effective difference is that "the program", whatever it ends up being, isn't running inside of a potentially vulnerable container. Assuming the CoreOS project comes up with an integrated config management solution, it will necessarily be tightly coupled with the CoreOS operating system & other code, QE process, etc. which will mean more automated testing & eyeballs looking for vulnerabilities.

gcmalloc commented 7 years ago

I'm not sure about that. Systems such as puppet or salt already received a lot of attention and scrutiny from the eyes of many. They are also already used in many production environment. The presented approach uses puppet as configuration management. The container is there for two reason

What are you proposing as an alternative solution ?

tgelter commented 7 years ago

@gcmalloc, I'm not referring to the config management / orchestration systems as the area lacking eyeballs. I'm referring to the potential of using any potentially (probably?) vulnerable container as the location where the agent runs, which in turn has wide access to the underlying docker host.

Conversely, if the CoreOS project were to tightly couple a solution with CoreOS Container Linux, (not running inside of a container), then you reduce the surface area where vulnerabilities may exist -- remember, "containers don't contain".

As a potential hybrid approach, I suppose that the config management agent could run in a pre-determined (again, tightly-coupled) container (i.e. it's not up to the user to change the container being used without forking Container Linux).

crawford commented 7 years ago

This assumes that the container images ship things other than "the program". In a lot of cases, it's possible to ship very little in the container image. If that's not possible, tools like Clair can be used to scan against known-vulnerability databases. In the case of CM, most of these tools are written in runtime interpreted languages. Container Linux doesn't ship the interpreters (and will not ship them) which means you either need to put them on the host (difficult to update) or in a container (easy to update). The way I see it, it's much easier to keep these tools secure if we use containers.

tgelter commented 7 years ago

@crawford, that fits into the hybrid approach I mentioned above, and makes sense. My main concern is around mis-implementation if the container & agent install/config is left to the user to figure out & self-manage. I'd much prefer a secure-by-default approach to one that favors complete freedom to implement the agent however one sees fit.

Thank you all for the thoughts.

fcgravalos commented 6 years ago

Using official container images and recommendations should make us feel comfortable I guess. From puppet registry:

https://hub.docker.com/r/puppet/puppet-agent-alpine/

docker run --rm --privileged --hostname agent -v /tmp:/tmp -v /etc:/etc -v /var:/var -v /usr:/usr -v /lib64:/lib64 --link puppet:puppet puppet/puppet-agent-alpine

Other than that, running inside a container is just a problem if we don't trust the source, then I just encourage those of you with internal registries/repos like artifactory/nexus/quay to build the image on your own. If you trust your source Docker is just wapping the binary and its dependencies, it's as dangerous as any other process in the host.

I'm really thinking about this. Having to restart our bare-metals for a component minor-upgrade it's just not enough.

fcgravalos commented 6 years ago

Hi @crawford !

Is this something you think CoreOS still wants to work on? Does CoreOS still see the lack of CM support an issue? I want to make a decision, I'd rather go with CoreOS native solution but I need to understand if after a year CoreOS is still committed to work on this or if they have changed their vision about this.

Thanks!!

crawford commented 6 years ago

We aren't actively working on supporting CM tools, though I suspect many of those tools have made significant progress toward being able to run in a container (e.g. Puppet and Ansible). CoreOS has moved in the direction of using Kubernetes to manage the cluster, though there are still many unanswered questions. I think it's probably easiest if you go with the CM tool of choice (assuming it has good container support). It was never my (personal) vision to create another CM tool, but to enable others to work seamlessly on Container Linux.

euank commented 6 years ago

@crawford I think your original reply may have caused some confusion here, namely the portion:

.... but would eventually have to tackle configuration management. That's the next phase ;)

Perhaps an edit note at the bottom of your original comment clarifying that there's no concrete plans could help people coming to this issue?

CoreOS has moved in the direction of using Kubernetes to manage the cluster

Tectonic/CoreOS Inc has, yes, but Container Linux is less opinionated about that. I think it's also accurate to state that Tectonic/K8s tooling has been developed which acts as either ad-hoc configuration management, or as a provisioning system.

Tectonic's node agents could, if a CM system were supported, quite likely lean on that to simplify themselves.

More recently, Tectonic has begun moving towards the stance that rather than using a CM, things should just be re-provisioned to pick up config changes (or at least that's my understanding).

I don't think that CoreOS Container Linux users should be required to write ad-hoc node agents for small configuration updates.

I think it's probably easiest if you go with the CM tool of choice (assuming it has good container support)

Note that container support and CoreOS Container Linux support may be distinct things. If we wish to encourage users to use external CM tools, I do think we should also do due diligence in documenting that those tools work, and have tests verifying that.

@fcgravalos

Is this something you think CoreOS still wants to work on? Does CoreOS still see the lack of CM support an issue?

@crawford's correct that we're not actively working on CM tooling nor do we have specific recommendations about it.

I do think it is a problem, and I think it will remain a problem until we do one of the following things:

  1. Document existing third-party CM tooling that works without requiring messy hacks, either point to external docs maintained by them or write docs ourselves for basic usage
  2. Recommend/document not using CM tools, but rather reprovisioning for config changes
  3. Provide a working/integrated/CM tool, be it an existing tool in a torcx addon or a new thing altogether

To be clear, I'm not trying to express an opinion that any of the above solutions are clearly correct, nor are they strictly mutually exclusive. I'm just trying to state what possible directions I see to solving this issue.

The first, documentation of existing tools, may be a good starting point since it doesn't preclude any other option and also may be a good place for anyone to step in and contribute.