coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

CoreOS 1235.9.0 update causes SELinux error with docker storage plugin #1795

Open drbolsen opened 7 years ago

drbolsen commented 7 years ago

Issue Report

Bug

Container Linux Version

NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.9.0
VERSION_ID=1235.9.0
BUILD_ID=2017-02-02-0235
PRETTY_NAME="Container Linux by CoreOS 1235.9.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

Environment

Azure Cloud, 3 machine cluster

Expected Behavior

Docker containers launch with NFS volume driver plugin (netshare), something like below:

docker run --volume-driver=nfs -v 10.0.0.4/folder:/folder ${image}:${tag}

Before an upgrade from 1235.6.0 to 1235.9.0 it was just working.

Actual Behavior

Attempt to launch a container fails with the following error message:

/usr/bin/docker: Error response from daemon: SELinux relabeling of /var/lib/docker-volumes/netshare/nfs/10.0.0.4/folder is not allowed: "operation not supported".

We just got only one machine updated recently, all other boxes with 1235.6.0 behaving normally.

opusmagnum commented 7 years ago

I have a similar issue with the version 1235.9.0 by "docker create ...".

Feb 06 13:34:06 xxxxxx dockerd[1517]: time="2017-02-06T13:34:06.673392235Z" level=error msg="Handler for POST /v1.24/containers/create returned error: Error relabeling upper directory: SELinux relabeling

In my case -- SELinux was disabled. After I have changed it to permissive mode, the error disappeared. More about SELinux o CoreOS: https://coreos.com/os/docs/latest/selinux.html

drbolsen commented 7 years ago

Hey @opusmagnum, thanks for the info - we will look at it.

I am, however, a bit confused - so does it mean that the default behaviour has been changed in 1235.9.0 and now SELinux is in enforced mode by default and must be turned to permissive? This would be unwise.

In 1235.6.0 the default SELinux config file is already set in permissive mode

# This file controls the state of SELinux on the system on boot.

# SELINUX can take one of these three values:
#   enforcing - SELinux security policy is enforced.
#   permissive - SELinux prints warnings instead of enforcing.
#   disabled - No SELinux policy is loaded.
SELINUX=permissive

# SELINUXTYPE can take one of these four values:
#   targeted - Only targeted network daemons are protected.
#   strict   - Full SELinux protection.
#   mls      - Full SELinux protection with Multi-Level Security
#   mcs      - Full SELinux protection with Multi-Category Security
#              (mls, but only one sensitivity level)
SELINUXTYPE=mcs

In the link provided it is clearly stated that Container Linux implements SELinux, but currently does not enforce SELinux protections by default.

Also I noticed there is a bunch of SELinux changes in alfa and beta channels, wonder if those changes slipped into the stable channel somehow.

argent-smith commented 7 years ago

Have the same issue with convoy-nfs driver. CoreOS log here:

https://gist.github.com/argent-smith/890e21102af53bcb1bccf7686ec27d12

Hope it helps.

argent-smith commented 7 years ago

@drbolsen, @opusmagnum see this:

boinc ~ # sestatus
SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             mcs
Current mode:                   permissive
Mode from config file:          permissive
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      30

Besides that, the problem is there.

argent-smith commented 7 years ago

Found that this regression is present in 1235.8.0, too. So probably it's there since than.

CyrilPeponnet commented 7 years ago

I have kind of same issue but with a little variation (coreos alpha with docker 1.13).

With netshare with docker run it works fine.

core@localhost ~ $ docker run -i -t --volume-driver=nfs -v server/share:/test busybox ls /test | wc -l
17

It works fine also when creating a volume then access it with docker run (either with our without netshare).

It also works fine with docker run for a standard volume like:

core@localhost ~ $ docker volume create  --name share
share
core@localhost ~ $ docker run -i -t -v share:/test busybox ls /test
core@localhost ~ $ docker run -i -t -v share:/test busybox touch /test/bla
core@localhost ~ $ docker run -i -t -v share:/test busybox ls /test
bla

But for a service I have the following error:

core@localhost ~ $ docker service create --replicas 1 --mount type=volume,src=foo,dst=/test --restart-condition none busybox ls /test/
co1m93dlthz2zm5r0mijq097w
core@localhost ~ $ docker service ls
ID            NAME          MODE        REPLICAS  IMAGE
co1m93dlthz2  lucid_newton  replicated  0/1       busybox:latest
core@localhost ~ $ docker service ps co
ID            NAME            IMAGE           NODE       DESIRED STATE  CURRENT STATE           ERROR                             PORTS
xaoibh3k7nem  lucid_newton.1  busybox:latest  localhost  Shutdown       Rejected 7 seconds ago  "SELinux relabeling of  is not…

Without docker debug telling:

Feb 10 21:27:52 localhost dockerd[15869]: time="2017-02-10T21:27:52.700467910Z" level=debug msg="Registering new volume reference: driver \"local\", name \"share\""
Feb 10 21:27:52 localhost dockerd[15869]: time="2017-02-10T21:27:52.713069404Z" level=error msg="fatal task error" error="SELinux relabeling of  is not allowed: \"no such file or directory\"" module="node/agent/taskmanager" task.id=r3xqyr44mfhz50c9vs187mt98
Feb 10 21:27:52 localhost dockerd[15869]: time="2017-02-10T21:27:52.713445442Z" level=debug msg="state changed" module="node/agent/taskmanager" state.desired=RUNNING state.transition="PREPARING->REJECTED" task.id=r3xqyr44mfhz50c9vs187mt98

The message "SELinux relabeling of is not allowed: \"no such file or directory\"" is weird because the there is gap between of and is.

Note: This is a docker swarm with 1 node for testing.

Works fine with stable and beta releases but I really need docker 1.13. Works fine with docker for mac release 1.13.

drbolsen commented 7 years ago

@argent-smith thanks for the info. Unfortunately I haven't had time to dig more into this issue, at this stage we just turned off automatic updates and locked 1235.6.0 until CoreOS team will be able to find time to look into this issue.

By the way we have a dev-op environment with a single 1235.8.0 box - surprisingly it doesn't cause us any issue with storage plugins.

argent-smith commented 7 years ago

@drbolsen I've turned off the updates as well. Also, the bug reproduces on 1235.8.0 as well (as far as I tested)

CyrilPeponnet commented 7 years ago

I tried with docker 1.13.1 on top of the stable CoreOS I have the same issue with SELinux relabeling.

core@localhost ~ $ cat /etc/lsb-release
DISTRIB_ID="Container Linux by CoreOS"
DISTRIB_RELEASE=1235.9.0
DISTRIB_CODENAME="Ladybug"
DISTRIB_DESCRIPTION="Container Linux by CoreOS 1235.9.0 (Ladybug)"
core@localhost ~ $ docker info |grep -i Version
Server Version: 1.13.1
containerd version: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1
runc version: 9df8b306d01f59d3a8029be411de015b7304dd8f
init version: N/A (expected: 949e6facb77383876aeff8a6944dde66b3089574)
Kernel Version: 4.7.3-coreos-r2
core@localhost ~ $ docker --version
Docker version 1.13.1, build 092cba3
argent-smith commented 7 years ago

As far as I know, this regression was introduced in 1235.8.0. Hope we will see some comments from CoreOS team?

drbolsen commented 7 years ago

Please see https://github.com/coreos/bugs/issues/1757#issuecomment-272397527 I reckon that is why there were no issues before 1235.x release.

Now with SELinux re-enabled in docker again, I guess we need to look that various SELinux flags are enabled, for example https://github.com/ContainX/docker-volume-netshare/issues/89

CyrilPeponnet commented 7 years ago

@drbolsen should I open a separate issue about my comment above or it's tight to the same issue?

drbolsen commented 7 years ago

@CyrilPeponnet - just to be clear, I am just an another fellow user of CoreOS :) so treat my comments as non-binding guidance, by all means feel free to open a new issue if you think it is more appropriate for your situation.

As I mentioned before, I feel that the root cause of my issue, yours and other people reporting in this thread that SELinux settings for docker were re-enabled (https://github.com/coreos/bugs/issues/1757#issuecomment-272397527) in the most recent releases immediately resulting in a bunch of side effects.

In regards to your specific situation, I think there were few similar cases that resemble symptoms of your issue, for example, https://github.com/codedellemc/rexray/issues/180 and https://github.com/docker/docker/pull/20829. A suggested workaround is to try running a docker daemon or a specific container with an additional flag that disables SELinux, e.g. --security-opt label:disable. I am planning to spin a test cluster on the weekend with the latest stable release and play with these flags. Happy to share the results.

Cheers

CyrilPeponnet commented 7 years ago

Ok after deploying docker 1.13 my self on top of stable (replacing binaries with docker ones), it works now fine. Not sure why it's not working properly with the builtin docker...

Well it works only because I forgot to set the --selinux flag... So this is definitely a selinux issue.

tomfotherby commented 7 years ago

I have the problem on alpha 1325.0.0. Error is: "SELinux relabeling of is not..."

To reproduce:

$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1325.0.0
BUILD_ID=2017-02-15-2139
PRETTY_NAME="Container Linux by CoreOS 1325.0.0 (Ladybug)"

$ docker volume create --name test-vol-1
$ docker service create --name test-service-1 --mount target=/var/www,src=test-vol-1,type=volume busybox top
$ docker service ps test-service-1
ID            NAME              IMAGE           NODE                         DESIRED STATE  CURRENT STATE                    ERROR                             PORTS
nkmir0enl2tt  test-service-1.1      busybox:latest  ip-10-56-1-192.ec2.internal  Ready          Rejected less than a second ago  "SELinux relabeling of  is not…"  
CyrilPeponnet commented 7 years ago

https://github.com/docker/docker/issues/31137

sepiroth887 commented 7 years ago

Looks like 1235.12.0 has the same issue still :(

argent-smith commented 7 years ago

Hi everybody. Is there any hope yet?

absudabsu commented 7 years ago

I think this an issue with Docker, rather than CoreOS, since I am getting a similar issue in Fedora (relabelling operation not allowed). My solution was to disable SELinux, and mount via sshfs rather than samba (I'm using a Linux samba server). Not ideal by any means, but based on your accounts, it seems not limited to CoreOS. This behavior is new... maybe with the latest linux-kernel/docker update? That seemed to break a lot of things in Fedora, maybe also in Docker?

marcinkoziej commented 7 years ago

What is t he status of this issue? It was removed from Backlog - does it mean WONTFIX?

crawford commented 7 years ago

@lolownia We're experimenting with the GitHub Projects feature. We'll close issues if we decide that we can't or won't fix them.

jaytho commented 7 years ago

I have some info that might help: I got this error trying to fix access of a host volume from within a container when I tried to add the -v host:container:Z or :z options. I got the cannot relabel host volume error as stated above. I removed the :z option and my original selinux problem was cleared up, and I am now operational- as the original :z/Z option changed the selinux perms to the correct virtual_sandbox_file. So again: new nfs mount point with open perms on /nfs mount nfsserver:/nfs /nfs docker -v /nfs:/nfs nfstest mkdir /nfs/test got me regular old permission denied. then docker -v /nfs:/nfs:z nfstest mkdir /nfs/test gets the SELinux relabeling of /nfs is not allowed operation not supported docker -v /nfs:/nfs:Z nfstest mkdir /nfs/text gets the SELinux relabeling of /nfs is not allowed operation not support and when I switched back docker -v /nfs:/nfs nfstest mkdir /nfs/test >works< ls -Z on /nfs now has the svirt_sandbox_file_t from within the container (I hope this helps)

rhoerbe commented 6 years ago

same problem with docker-1.13.1-75.git8633870.el7.centos.x86_64. setting selinux to permissive does not help.