flatcar / Flatcar

Flatcar project repository for issue tracking, project documentation, etc.
https://www.flatcar.org/
Apache License 2.0
701 stars 30 forks source link

Can't install Kubernetes >=1.22 with RKE due to missing SELinux custom policies #598

Open tsde opened 2 years ago

tsde commented 2 years ago

This is not a "bug" on Flatcar side. The issue is more about the way to deal with custom SE Linux policies but I was not sure how to contextualize the issue so feel free to remove the "bug" label. Still, the impact is not trivial.

Description

Using RKE, installing (or upgrading) Kubernetes >= 1.22 fails with the following error message

Failed running cluster err:[[selinux] Host [10.130.0.241] does not recognize SELinux label [label=type:rke_container_t]. This is required for Kubernetes version [>=1.22.0-rancher0]. Please install rancher-selinux RPM package and try again]

Starting with 1.22, RKE (and RKE2) chose to use custom SE Linux policies for their setup. These can be installed through dedicated RPMs: rancher-selinux for RKE and rke2-selinux for RKE2 but these can't be used as is with Flatcar.

I also opened an issue on RKE side to get their opinion on this : https://github.com/rancher/rke/issues/2788

Impact

It's not possible to use RKE (or RKE2) with Flatcar Linux starting with Kubernetes 1.22.

Environment and steps to reproduce

  1. Set-up

Flatcar version :

NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3033.2.0
VERSION_ID=3033.2.0
BUILD_ID=2021-12-10-1820
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3033.2.0 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar-linux.org/"
BUG_REPORT_URL="https://issues.flatcar-linux.org"
FLATCAR_BOARD="amd64-usr"

I'm using RKE through terraform :

resource "rke_cluster" "main" {
  kubernetes_version = "v1.22.4-rancher1-1"
  cluster_name       = "test-cluster"
  authentication {
    strategy = "x509"
    sans     = "<...redacted...>"
  }
  dynamic "nodes" {
    for_each = flatten([local.rke_cluster_master_nodes, local.rke_cluster_worker_nodes])
    content {
      address           = nodes.value["address"]
      ssh_key           = nodes.value["id_rsa"]
      labels            = nodes.value["labels"]
      role              = nodes.value["roles"]
      hostname_override = nodes.value["name"]
      user              = nodes.value["user"]
    }
  }
  dns {
    provider = "coredns"
  }
  ingress {
    provider     = "none"
  }
  network {
    plugin  = "calico"
    options = {
        "calico_cloud_provider" : "none",
        "calico_flex_volume_plugin_dir" : "/var/lib/kubelet/volumeplugins"
    }
  }
  services {
    kube_api {
      audit_log {
        enabled = true
      }
      secrets_encryption_config {
        enabled = true
      }
    }
  }
  upgrade_strategy {
    drain                        = false
    max_unavailable_worker       = 1
    max_unavailable_controlplane = 1
  }
}
  1. Task

Run terraform apply to deploy (or upgrade) your cluster to 1.22

  1. Action(s)

Wait for the error

  1. Error: [describe the error that was triggered]

Error is triggered early in the process as it's a pre-check done by RKE before doing the actual installation

Failed running cluster err:[[selinux] Host [10.130.0.241] does not recognize SELinux label [label=type:rke_container_t]. This is required for Kubernetes version [>=1.22.0-rancher0]. Please install rancher-selinux RPM package and try again]

Expected behavior / Additional information

I am a complete newbie when it comes to SE Linux and I don't really know of a way to work around this. As mentioned in the RKE issue, I manually tried to import the SE module from the RPM in my Flatcar instance but failed because of /usr being read-only. I didn't find any documentation about adding custom SE Linux configuration on a Flatcar instance. It feels like it's not easily doable without maintaining a custom Flatcar image which seems overkill for this kind of small configuration tweak and I'd like to avoid it.

What would be the best way to tackle this ? As RKE (and RKE2) are popular tools for deploying Kubernetes, does it make sense to request for new packages based on their RPMs ?

Thanks

tormath1 commented 2 years ago

Hi @tsde,

Thanks for your report.

SELinux has three modes:

Flatcar should default to permissive mode - could you first check that by connecting on one of your node and run a sudo getenforce command ?

rke_container_t won't be available on the system because we ship regular modules and policies at the moment, there is a tracking issue for using container-selinux policies (https://github.com/flatcar-linux/Flatcar/issues/479) but it's not yet started.

I'll have a look to see if you could easily integrate rancher-linux with Ignition for example.

EDIT:

I managed to compile locally rk2.pp using dapper:

From Flatcar docs:

rm /etc/audit/rules.d/80-selinux.rules
rm /etc/audit/rules.d/99-default.rules
rm /etc/selinux/mcs
cp -a /usr/lib/selinux/mcs /etc/selinux
rm /var/lib/selinux
cp -a /usr/lib/selinux/policy /var/lib/selinux
semodule -DB

Then:

wget https://github.com/rancher/rke2-selinux/archive/refs/tags/v0.9.stable.1.zip

I just updated the build script to only compile rk2.pp (and not the rpm package) - then with built with dapper:

dapper -f Dockerfile.microos.dapper build

Then when we try to add the policy:

$ sudo semodule -n -i ./rke2.pp
Failed to resolve typeattributeset statement at /var/lib/selinux/mcs/tmp/modules/400/rke2/cil:8
semodule:  Failed!

I suspect it's because https://github.com/flatcar-linux/Flatcar/issues/479 is missing. :wink:

tsde commented 2 years ago

Hi @tormath1

Thanks for your quick reply and for digging into this

Current mode of SE Linux is indeed set to permissive

If I'm not wrong, according to RKE code, SE linux detection is done using the docker info API. If selinux is listed in the SecurityOptions, RKE tries to use its custom SE linux labels regardless of the SE linux mode being used.

tormath1 commented 2 years ago

@tsde thanks for your answer.

SE linux detection is done using the docker info API

Then it makes sense, docker is started with SELinux security options:

$ docker info
...
 Security Options:
  seccomp
   Profile: default
  selinux
  cgroupns

In that case, you might be interested to use the following docker.service systemd dropin to override SELinux options:

[Service]
Environment=DOCKER_SELINUX=
Environment=DOCKER_SELINUX=--selinux-enabled=false
tsde commented 2 years ago

Thanks for the suggestion, I'll give it a try. In the long run, I'd like to keep security related features enabled. Switching SE linux to enforced mode is something I have in mind in a near future (still have to level up in this area though ;) So it would be great to have a way to integrate this with a fully SE linux enabled Flatcar system.

tormath1 commented 2 years ago

@tsde last time I checked, integrating containers policies was blocked by this issue https://github.com/SELinuxProject/refpolicy/issues/397, I see now an opened PR with recent activities: https://github.com/SELinuxProject/refpolicy/pull/434. We might expected to have this merged upstream soon them.

Let me know if the drop-in solution is enough by the meantime and let's keep this issue opened to track this feature.

I'll have a look to merge rancher policies to the Gentoo upstream too.

tsde commented 2 years ago

@tormath1 Thank you for your concern. Hope it'll move forward smoothly. Integrating these policies would be really nice.

In the meantime, I was able to upgrade Kubernetes to 1.22 following your recommendations. Setting --selinux-enabled flag to false did the trick.

And thanks for the good work on the Flatcar project

bitfisher commented 2 years ago

Any progress so far?

mohsenmottaghi commented 2 years ago

We have the same issue in our production cluster, and we are stuck in v1.21.X. As @tormath1 and @tsde mentioned, we can disable SELinux by adding a systemd drop-in, but it's not a good idea for the production environments. So we can't upgrade to newer versions.

bitfisher commented 2 years ago

Any updates? Disabling SELinux in production clusters isn't really an option!

bitfisher commented 1 year ago

Any updates?

tormath1 commented 1 year ago

hi @bitfisher, we are still working on providing a fully labelled Flatcar OS. Some news have been shared during the office hours of July (https://github.com/flatcar/Flatcar/discussions/797). We still have a couple of opened PRs like: https://github.com/flatcar/coreos-overlay/pull/1993 and we're working on it.

Once done, we should be able to look into the rke2-selinux policy. I just have a big concern regarding the compatibility between refpolicy and the rke2-selinux policy (as stated in this issue: https://github.com/rancher/rke2-selinux/issues/25).

maese83 commented 1 year ago

I'm blocked in production to upgrade my kubernetes cluster to >1.22 :(

xeor commented 1 year ago

Any movement on this?