automaticserver / lxe

Kubernetes CRI shim for lxd. Initially contributed by Automatic Server AG (http://www.automatic-server.com)
Apache License 2.0
224 stars 21 forks source link

Alternative approach #8

Closed micw closed 5 years ago

micw commented 5 years ago

Hello, I like the idea to manage stateful lxc-containers with kubernetes. However, after reading the drawbacks and workarounds, I wonder if this is the "best" way to achieve this.

A while ago, I have created a small project that runs LXC within a container (https://github.com/micw/docker-lxc). This is just a PoC with many glitches but it works out of the box with an unmodified kubernetes and it runs stable (I run my a daily used archlinux as remote desktop on it for >1 year). One drawback is that the lxc-container dies if the docker container is restarted.

But I have thought if it could be possible to combine both approaches and I came to the following ideas:

What do you think about this?

This approach would also allow to run thet event listening service as well as the lxd as docker containers, removing all special reqirements for the deployment.

automaticserver commented 5 years ago

Listen for K8s-Events means to create a Kubelet. Creating a Kubelet is an amount of code which is a lot. We've checked the complexity of writing a LXC Shim or our own Kubelet and it was clear that we never could do that in a reasonable amount of time.

Internally we've packaged LXE of course and have additionally an enterprise extension to keep containers save (filters unneccessary delete commands from kubelet).

Kubelet does really a lot, please take a look at it: it checks your local system for RAM, CPU, filesystems, it mounts stuff, it creates new filesystems to mount, it organizes networking, it does evict pods, it delivers host metrics - it does really so much stuff, it's unbelieveable.

So overall: we're really happy with LXE and we can't recommend to write you own kubelet.

micw commented 5 years ago

Listen for K8s-Events means to create a Kubelet

No, it just means to listen to the events. There are multiple ways to do so.

automaticserver commented 5 years ago

Then you will never create a pod object, if you just listen. Kubelet needs to know there is a pod, and it knows it's because there is a CRI implementation which gives success and reports the right things

And the other way round: LXE works so why thinking about a different way?

micw commented 5 years ago

I meant it different: I thought of running an unmodified kubernetes including kubelet and default cri or containerd which launches a normal kubernetes pod. this pod runs a special image that is the interface to lxc.

And the other way round: LXE works so why thinking about a different way?

Because this way some limitations could be removed: the special image name handling, the need of a custom configured kubelet, coexistence with docker containers on the same nodes

Edit: PS: I just want to share my thoughts about it and discuss, it's no approach to make you change things ;)

automaticserver commented 5 years ago

I see no big deal with 2 different types of nodes, especially in a virtual environment.

That kubernetes exposes interfaces of the container runtime is a design flaw, ok. But I don't get it: what is a "normal pod"? A docker container? And this docker contains another container runtime, LXC?

micw commented 5 years ago

what is a "normal pod"? A docker container?

yes

And this docker contains another container runtime, LXC?

No, my idea is to have this just as a "placeholder". If such a container is created, the event listener triggers the creation of an lxc container. If it's deleted, the listener deletes the lxc container. The lxc container could use the same cgroups as the docker container, so it would inherit the network.

dionysius commented 5 years ago

Hi @micw! I'm a bit late to the conversation. Thanks for your thoughts and its always good to have a different perspective. The "best" way usually depends on the projects goal. :) I get exactly what you mean, but let me show you some new challenges:

First, the hierarchy of a pod will look like this:

This list is independent how it's implemented, but just shows how its result looks like you've described. So, the challenges are (just to name a few):

So, if you now think, if you don't want to tie the lxc container to the pod that tightly, you will loose all benefits from running a pod (scheduling, metrics, ...). I kinda drifted away sounding negative, but I don't mean it that way and just compiled the things coming to mind working on this project for a while now.

I'm also very open for supporting OCI, someone making a remote hub for the community, or finding a convenient way to directly "convert"(?) the OCI to a lxc container - I would find a way to incorporate that into LXE so we can have more flexibility. That will make a lot of things easier. But: If you manage to handle OCI, you can inject that to cri-o as plugin probably as long as you're happy with OCI images.

micw commented 5 years ago

@dionysius Thank you for that great feedback. In https://github.com/micw/docker-lxc I actually launch lxc within the container, giving exact the structure you describe. LXC creates it's cgroups as child of that containers cgroup, I start it with "Host network" mode, so it inherits the pod's network. Limits are also inherited from the pod. The imagename of the lxc container could be passed as env (alternatively as annotation or label). That's by the way also a thing you could do in your current implementation. Define a kind of marker imagename ("lxe") and expect the image in LXC_IMAGE env var.

dionysius commented 5 years ago

Yeah I saw that. Well, this projects goal is to use the lxd toolchain and to not use/depend on docker