Closed alogoc closed 6 years ago
Same issue here using aws ebs volumes
Some volumes are not mounted
[ 793.976852] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 855.417000] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 864.572472] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 923.424917] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 933.778011] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 994.015922] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 1003.962429] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.5.0
VERSION_ID=1235.5.0
BUILD_ID=2017-01-08-0037
PRETTY_NAME="Container Linux by CoreOS 1235.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
What hardware/cloud provider/hypervisor is being used to run CoreOS?
Aws
Same issue after automatic upgrade to:
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.5.0
VERSION_ID=1235.5.0
BUILD_ID=2017-01-08-0037
PRETTY_NAME="Container Linux by CoreOS 1235.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
I confirm this bug too on kvm/libvirt
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.5.0
VERSION_ID=1235.5.0
BUILD_ID=2017-01-08-0037
PRETTY_NAME="Container Linux by CoreOS 1235.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
Seeing the same under ESXi running a clean install of version 1284.2.0. The log line shows up every time the master tries to deploy kubernetes-dashboard on one of the minions.
I am observing the same errors on CoreOS stable (1235.6.0)
Same issue here as well:
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1298.3.0
VERSION_ID=1298.3.0
BUILD_ID=2017-02-02-0148
PRETTY_NAME="Container Linux by CoreOS 1298.3.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
Running ESXi VM, with nfs4 mounts, Kubernetes cluster. Happens as soon as an auto update occurs. Initial installation is fine.
Hi, It seems like a few of us have this issue, but I don't see any possible solutions yet?
Same here CoreOS stable 1235.9.0 on vSphere 6.
Has anybody got any fixes to this?
Sent from my Phone
DA.
On 11 Feb 2017, at 12:48, Eivin Giske Skaaren notifications@github.com<mailto:notifications@github.com> wrote:
Same here CoreOS stable 1235.9.0 on vSphere 6.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/coreos/bugs/issues/1580#issuecomment-279163070, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AYaOX4lK3O3kvTlqoBGNDRPKTcZUTatVks5rbfR6gaJpZM4KDy9y.
[jones knowles ritchie]http://www.jkrglobal.com 85 Spring Street 3rd Floor New York, NY 10012 +1 (347) 205 8200
Follow us: [fb] https://www.facebook.com/jkrGlobal [in] https://www.linkedin.com/company/jkr [ig] http://instagram.com/jkrglobal [tw] https://twitter.com/jkrGlobal [cp] http://creativepool.com/jkr
This email and any attachments are confidential and may also be privileged. If you are not the addressee, do not disclose, copy, circulate or in any other way use or rely on the information contained in this email or any attachments. If received in error, notify the sender immediately and delete this email and any attachments from your system. Emails cannot be guaranteed to be secure or error free as the message and any attachments could be intercepted, corrupted, lost, delayed, incomplete or amended. Jones Knowles Ritchie does not accept liability for damage caused by this email or any attachments and may monitor email traffic. ..p..
Do we have a recommendation for what folks who are experiencing this should do?
Happy to see this is flagged as a p0
. Until a patch is out, what's the best course of action to take in the interim?
Still having this issue using latest stable version of CoreOS
Experience the same, but only after reboot and not on a freshly provisioned machine:
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1235.12.0
VERSION_ID=1235.12.0
BUILD_ID=2017-02-23-0222
PRETTY_NAME="Container Linux by CoreOS 1235.12.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
Workaround: Restart the nfs-sever. In my case systemctl restart nfsd.service
.
I just noticed that it was on every/any reboot, and not just when it auto updates.
we have this issue as well
Still seeing this error on baremetal too (matchbox 0.5.0 w/ bootkube 0.3.9)
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1298.5.0
VERSION_ID=1298.5.0
BUILD_ID=2017-02-28-0013
PRETTY_NAME="Container Linux by CoreOS 1298.5.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: mcs
Current mode: permissive
Mode from config file: permissive
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 30
I've see this on CoreOS Stable 1298.6.0 on VSphere 6.0.
I'm also experiencing the same error when i reboot, after reboot it spams these errors for about a minute and then stops. I am running stable 1298.6.0 and similar to TimJones, I am deploying to baremetal via matchbox/tectonic
Same here, running Stable 1298.7.0 on AWS.
Took me forever to figure out what the source of it was. Came from kubernetes trying to run weave. Long trail of errors to get here.
This error shows up in dmesg on my machines (both Container Linux and Fedora) each time I run a docker container.
However, I have yet to see any adverse effect from it.
The original issue mentions it causing a "... reboot to hang while trying to umount NFS kubernetes persistent volumes", and there are a few other issues attributed to it here, but I worry that each of the mentioned issues is caused by something else and this is a red herring.
For my machines, ignoring this dmesg output hasn't caused any issues yet. This includes a Kubernetes cluster with some nfs mount churn, as well as machines just running a few once-off containers.
If anyone is confident that this is causing real impact, other than a dmesg logline, the impact and how the two were linked together would be helpful!
I removed --selinux-enabled
for the docker engine command line, but still had many other issues. Now that they all are resolved, I will try to restore --selinux-enabled
and see if there are any side effects.
@euank along with dmesg log, services that are installed in our pods, are not accessible both locally and outside the pod; ie there isn't any tcp connections incoming/outgoing to/from pods although interfaces are up and dns is well-configured.
Update was 4.9.16-coreos-r1 -> 4.9.24-coreos.
Restarts happened for our other machines also however some of them recovered without downtime.
The difference is problematic machine's last dmesg message is (then it stops);
SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
It seems it blocks other services somehow (like docker networking in this case).
@nyilmaz I suspect the networking issue you're seeing is #1936. We're pushing an additional update to stable to address that issue. Sorry!
Assuming that's the issue, a workaround is posted on that thread and it's unrelated to this dmesg output. If you can double check that workaround works, or the update (1353.7.0) works once it rolls out, that would help clarify.
Apr 27 12:34:34 ip-10-0-2-40.us-west-2.compute.internal dockerd[2415]: time="2017-04-27T12:34:34.657727261Z" level=error msg="Create container failed with error: invalid header field value \"oci runtime error: container_linux.go:247: starting container process caused \\\"process_linux.go:359: container init caused \\\\\\\"rootfs_linux.go:53: mounting \\\\\\\\\\\\\\\"/data/k8s/kubelet/pods/5932bf47-2b3f-11e7-ae8d-023161a008ef/etc-hosts\\\\\\\\\\\\\\\" to rootfs \\\\\\\\\\\\\\\"/var/lib/docker/overlay/85e7976aff2ede0c039d033503b6dbb72154a2110a0c5678f0e569d8fc256c29/merged\\\\\\\\\\\\\\\" at \\\\\\\\\\\\\\\"/var/lib/docker/overlay/85e7976aff2ede0c039d033503b6dbb72154a2110a0c5678f0e569d8fc256c29/merged/etc/hosts\\\\\\\\\\\\\\\" caused \\\\\\\\\\\\\\\"not a directory\\\\\\\\\\\\\\\"\\\\\\\"\\\"\\n\""
Just moved running kubelet from hyperkube to kube_wrapper. Issue reported once again - all pods with persistent volumes fails now.
NAME="Container Linux by CoreOS" ID=coreos VERSION=1353.7.0 VERSION_ID=1353.7.0 BUILD_ID=2017-04-26-2154 PRETTY_NAME="Container Linux by CoreOS 1353.7.0 (Ladybug)" ANSI_COLOR="38;5;75" HOME_URL="https://coreos.com/" BUG_REPORT_URL="https://issues.coreos.com"
We're seeing this issue as well. Very similar to @nyilmaz, we are see blocking happening in docker and kubelet-wrapper. We're running the latest stable build from scratch:
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.7.0
VERSION_ID=1353.7.0
BUILD_ID=2017-04-26-2154
PRETTY_NAME="Container Linux by CoreOS 1353.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
# dmesg | grep -i "SELinux: mount invalid" | wc -l
11
# uptime
15:26:52 up 30 min, 1 user, load average: 0.02, 0.11, 0.27
Compared to one of our older hosts that isn't having this issue:
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1298.7.0
VERSION_ID=1298.7.0
BUILD_ID=2017-03-31-0215
PRETTY_NAME="Container Linux by CoreOS 1298.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
# dmesg | grep -i "SELinux: mount invalid" | wc -l
0
# uptime
15:28:26 up 2 days, 22:48, 1 user, load average: 1.20, 0.59, 0.27
@bchanan03 based on the directory in that error, it looks like you're using the --root-dir
flag on the kubelet.
That flag isn't supported with the kubelet-wrapper and will break unless you make an additional effort to bindmount the required extra directories. That error message is basically saying that the kubelet made mounts under --root-dir
inside the kubelet-wrapper chroot and that docker cannot find them because the kubelet-wrapper script didn't expose that directory.
I don't think that issue is related to this one, though if you continue to run into issues after either no longer using the --root-dir
flag or after adjusting the kubelet-wrapper's mount args, please open a new issue.
@mikesplain I just wanted to demonstrate a host we have on same old version, with the error:
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1298.7.0
VERSION_ID=1298.7.0
BUILD_ID=2017-03-31-0215
PRETTY_NAME="Container Linux by CoreOS 1298.7.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"
# dmesg | grep -i "SELinux: mount invalid" | wc -l
12
# uptime
17:40:36 up 2 days, 1:17, 1 user, load average: 0.96, 0.67, 0.36
My guess is we're doing something incorrectly, but figured I'd post what I have. We do also see this on 1353.7.0, but there are other issues I don't yet understand which are preventing us from running the updated version.
@mars64 Ahh fair enough. Thanks!
I'm seeing this issue on a later version as well on an EC2 instance (m4.xlarge) -
$ cat /etc/os-release
NAME="Container Linux by CoreOS"
ID=coreos
VERSION=1353.8.0
VERSION_ID=1353.8.0
BUILD_ID=2017-05-30-2322
PRETTY_NAME="Container Linux by CoreOS 1353.8.0 (Ladybug)"
ANSI_COLOR="38;5;75"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://issues.coreos.com"
Actual error from logs (via aws console), since instance not reachable anymore via SSH-
SSH host key: SHA256:<key>(DSA)
SSH host key: SHA256:xxxxxxxxx[ 27.961692] IPv6: ADDRCONF(NETDEV_UP): docker0: link is not ready<sha key> (ECDSA)
SSH host key: SHA256:<shakey> (ED25519)
SSH host key: SHA256:<sha key> (RSA)
eth0: 10.100.100.116 fe80::67:8aff:fee6:8e57
ip-10-100-100-116 login: [ 31.061655] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 32.412445] Netfilter messages via NETLINK v0.30.
[ 32.422056] ip_set: protocol 6
[ 32.530060] ip6_tables: (C) 2000-2006 Netfilter Core Team
[ 36.402077] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 36.417821] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 37.456837] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[ 37.476006] SELinux: mount invalid. Same superblock, different security settings for (dev mqueue, type mqueue)
[63950.093111] dockerd: page allocation failure: order:4, mode:0x14040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null)
[63950.099049] dockerd cpuset=/ mems_allowed=0
[63950.101490] CPU: 1 PID: 22782 Comm: dockerd Not tainted 4.11.6-coreos #1
[63950.105420] Hardware name: Xen HVM domU, BIOS 4.2.amazon 02/16/2017
[63950.109066] Call Trace:
[63950.110550] dump_stack+0x63/0x90
[63950.112543] warn_alloc+0x11c/0x1b0
[63950.114635] ? __alloc_pages_direct_compact+0x55/0x110
[63950.117660] __alloc_pages_slowpath+0xd6c/0xe50
[63950.120486] ? wakeup_kswapd+0xdd/0x150
[63950.122787] __alloc_pages_nodemask+0x21b/0x230
[63950.125489] alloc_pages_current+0x8c/0x110
[63950.128020] kmalloc_order+0x18/0x40
[63950.130188] kmalloc_order_trace+0x24/0xa0
[63950.132602] __kmalloc+0x1a2/0x210
[63950.134651] ? __list_lru_init+0x35/0x210
[63950.137001] __list_lru_init+0x1a8/0x210
[63950.139319] sget_userns+0x22d/0x4d0
[63950.141666] ? get_anon_bdev+0x100/0x100
[63950.144130] sget+0x7d/0xa0
[63950.145840] ? get_anon_bdev+0x100/0x100
[63950.148209] ? 0xffffffffc04e9d60
[63950.150700] mount_nodev+0x30/0xa0
[63950.152790] 0xffffffffc04e90e8
[63950.154751] mount_fs+0x38/0x170
[63950.156721] vfs_kern_mount+0x67/0x110
[63950.159003] do_mount+0x1e5/0xcb0
[63950.161094] ? _copy_from_user+0x4e/0x80
[63950.163824] SyS_mount+0x94/0xd0
[63950.165999] do_syscall_64+0x5a/0x160
[63950.168460] entry_SYSCALL64_slow_path+0x25/0x25
[63950.172058] RIP: 0033:0x654f7a
[63950.174147] RSP: 002b:000000c43b14ee30 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
[63950.179414] RAX: ffffffffffffffda RBX: 000000c42001ca0c RCX: 0000000000654f7a
[63950.184083] RDX: 000000c426adbdd8 RSI: 000000c4293cddc0 RDI: 000000c426adbdd0
[63950.189498] RBP: 000000c43b14eee0 R08: 000000c4281ae1a0 R09: 0000000000000000
[63950.194196] R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000070
[63950.199207] R13: 0000000000001dc0 R14: 0000000000000045 R15: 0000000005555555
[63950.204506] Mem-Info:
[63950.206083] active_anon:1236082 inactive_anon:3926 isolated_anon:0
[63950.206083] active_file:27545 inactive_file:17530 isolated_file:0
[63950.206083] unevictable:0 dirty:535 writeback:29 unstable:0
[63950.206083] slab_reclaimable:331728 slab_unreclaimable:1737063
[63950.206083] mapped:27337 shmem:15893 pagetables:56081 bounce:0
[63950.206083] free:35374 free_pcp:0 free_cma:0
seems like docker0
didn't come up. The only customization - mount an ext4fs EBS volume and configure docker to log to it.
This issue also occurred a day after the k8s nodes were provisioned.
@gdmello as stated in comments above, the "mount invalid" entry is just unrelated noise in the log. The issue you are experiencing is completely unrelated and seems to be due to the kernel not being able to allocate a pretty big range of contiguous pages. This can be due to either a kernel bug, an hypervisor bug or some abnormal memory pressure.
Please try to reproduce it on a latest stable/beta/alpha, and also check your telemetry for memory consumption profile in the period of time leading up to the issue. If still ocurring, please open a dedicated bug report with all the information.
Thanks @lucab!
You are right - i do see this error even on a healthy kubernetes node. So it's a non-issue.
The offending message isn't printed in the current alpha. It's fixed by shipping a newer version of docker (17.06.1) which doesn't have this issue. (I did verify it was the docker version change, not kernel changes, which fixes this).
We should be able to close this once we have 17.06+ on stable.
The noise is still spewing out on busy nodes, now i'm on 1520.7.0
@euank will 17.06 be supported by Kubernetes?, Correct me if i'm wrong but i've only seen talk about 1.11.2, 1.12.6, 1.13.1, and 17.03.2 being validated so far
@roffe The answer to that is a little complicated. It's possible Kubernetes will move to recommending certain docker API versions, regardless of the release version (https://github.com/kubernetes/kubernetes/issues/53221). If they move to recommending it in that way, 17.06/17.09 both support the API version they use and would thus implicitly be considered valid I believe (with further validation and choice up to specific K8s distribution's discretion). I don't know any more than is in that issue; for more details or if you have other questions, you'd have to ask the Kubernetes project yourself.
As an idle anecdote, I personally run my K8s cluster against 17.09. I can't recommend that generally since my personal requirements and testing are less authoritative than the Kubernetes project, but I will point out the upstream recommendations are not generally based on known problems with newer versions, but rather lack of evidence altogether.
Container Linux by CoreOS stable (1576.4.0) Update Strategy: No Reboots core@compute1 ~ $ docker --version Docker version 17.09.0-ce, build afdb6d4
I'm closing this based on my past few comments in this issue; on recent versions of docker (which are shipped by default in all channels now) it shouldn't appear, and when it did appear I think it was typically benign.
If you still encounter this on a recent version of docker and it appears to cause real impact, please do open a new issue!
Issue Report
After upgrading the cluster to version 1122.2.0 stable I started seeing this error on the logs
Bug
Unless forcefully rebooted, it is causing reboot to hang while trying to umount NFS kubernetes persistent volumes. This happens on every reboot ever since upgraded to version 1122.2.0 stable.
CoreOS Version
Selinux status
Environment
What hardware/cloud provider/hypervisor is being used to run CoreOS?
VMware