coreos / bugs

Issue tracker for CoreOS Container Linux
https://coreos.com/os/eol/
146 stars 30 forks source link

XenServer Tools is unable to report CoreOS' VM IP address to parent XenServer after 1122.2.0 update #1563

Closed simoninkin closed 7 years ago

simoninkin commented 8 years ago

Issue Report

Bug

Basically, VM's XenServer Tools agent fails to register itself with host XenServer. This means there is no IP address shown in Networking tab, resulting in inability to enable Docker management (along with any other functionality, when a connection from XenServer to VM is required), due to XenServer being unable to connect to host VM.

CoreOS Version

NAME=CoreOS
ID=coreos
VERSION=1122.2.0
VERSION_ID=1122.2.0
BUILD_ID=2016-09-06-1449
PRETTY_NAME="CoreOS 1122.2.0 (MoreOS)"
ANSI_COLOR="1;32"
HOME_URL="https://coreos.com/"
BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

This issue has been tested with following CoreOS versions:

1068.9.0 - no issue 1068.10.0 - no issue 1122.2.0 - issue is present

Environment

XenServer 7 Applied update: XS70E004 Supplemental Packs:

XenCenter version 7.0 (build 7.0.1.3852) 64-bit

XenOrchestra (from sources):

CoreOS boots XenServer Tools agent starts - no errors XenCenter shows 'Virtualization state: Optimized (version 7.0 installed)' in VM's General tab XenCenter shows IP address(es) related to VM in Networking tab XenCenter is able to turn on Docker management via an SSH connection

Actual Behavior

CoreOS boots XenServer Tools agent starts - no errors XenCenter shows 'Virtualization state: Optimized (version 6.2 installed)' in VM's General tab XenCenter missing IP address(es) related to VM in Networking tab XenCenter is not able to turn on Docker management via an SSH connection. Failing due to missing IP address.

Reproduction Steps

  1. Boot CoreOS from ISO image with a default XenOrchestra cloud-config
  2. Open XenCenter, navigate to VM's Networking tab

    Other Information

Cloud config template from XenOrchestra:

#cloud-config

hostname: %VMNAMETOHOSTNAME%
ssh_authorized_keys:
  # - ssh-rsa <Your public key>
  # The following entry will automatically be replaced with a public key
  # generated by container management plugin. The key-entry must exist,
  # in order to enable container management for this VM.
  - ssh-rsa %CONTAINERRSAPUB%
coreos:
  units:
    - name: etcd.service
      command: start
    - name: fleet.service
      command: start
    # Hypervisor Linux Guest Agent
    - name: xe-linux-distribution.service
      command: start
      content: |
        [Unit]
        Description=Hypervisor Linux Guest Agent
        After=docker.service

        [Service]
        ExecStartPre=/media/configdrive/agent/xe-linux-distribution /var/cache/xe-linux-distribution
        ExecStart=/media/configdrive/agent/xe-daemon
  etcd:
    name: %VMNAMETOHOSTNAME%
    # generate a new token for each unique cluster at https://discovery.etcd.io/
new
    # discovery: https://discovery.etcd.io/<token>
write_files:
  # Enable ARP notifications for smooth network recovery after migrations
  - path: /etc/sysctl.d/10-enable-arp-notify.conf
    permissions: 0644
    owner: root
    content: |
      net.ipv4.conf.all.arp_notify = 1
simoninkin commented 8 years ago

XenServer bug tracker issue mirror: https://bugs.xenserver.org/browse/XSO-612

crawford commented 7 years ago

I'm not sure if there is anything we can do here. All of these tools are being inserted into CoreOS at runtime, rather than being shipped with CoreOS. The xe-daemon will have to be updated to work with this latest release of CoreOS.

simoninkin commented 7 years ago

Yeah, but it was working perfectly fine before 1122.2.0 Obviously something has changed in CoreOS that prevents the service from running. Citrix hasn't updated their products and my current XenServer environment can still run the agent on CoreOS 1068.10.0. I am basically stuck with this version until this issue is resolved =(

hcoyote commented 7 years ago

I'm running into this as well. It looks like xenstore is having problems writing to /proc/xen/xenbus, but I don't know if this is a xenserver issue or a coreos issue. In the tests below, both coreos instances are running concurrently on the same xenserver.

On 1168, strace shows:

core1168 system # strace -tt -T -s 1024 -f xenstore read name
01:05:57.052703 execve("/usr/bin/xenstore", ["xenstore", "read", "name"], [/* 18 vars */]) = 0 <0.000247>
01:05:57.053441 brk(0)                  = 0x55562d387000 <0.000014>
01:05:57.053703 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa2a8524000 <0.000018>
01:05:57.053956 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000103>
01:05:57.054231 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000019>
01:05:57.054362 fstat(3, {st_mode=S_IFREG|0644, st_size=25683, ...}) = 0 <0.000012>
01:05:57.054511 mmap(NULL, 25683, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa2a851d000 <0.000016>
01:05:57.054645 close(3)                = 0 <0.000034>
01:05:57.054754 open("/lib64/libxenstore.so.3.0", O_RDONLY|O_CLOEXEC) = 3 <0.000020>
01:05:57.054876 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`!\0\0\0\0\0\0@\0\0\0\0\0\0\0\370p\0\0\0\0\0\0\0\0\0\0@\0008\0\7\0@\0\31\0\30\0\1\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$d\0\0\0\0\0\0$d\0\0\0\0\0\0\0\0 \0\0\0\0\0\1\0\0\0\6\0\0\0\230j\0\0\0\0\0\0\230j \0\0\0\0\0\230j \0\0\0\0\0p\5\0\0\0\0\0\0\3605\0\0\0\0\0\0\0\0 \0\0\0\0\0\2\0\0\0\6\0\0\0\260k\0\0\0\0\0\0\260k \0\0\0\0\0\260k \0\0\0\0\0\300\1\0\0\0\0\0\0\300\1\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0xX\0\0\0\0\0\0xX\0\0\0\0\0\0xX\0\0\0\0\0\0\344\1\0\0\0\0\0\0\344\1\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\230j\0\0\0\0\0\0\230j \0\0\0\0\0\230j \0\0\0\0\0h\5\0\0\0\0\0\0h\5\0\0\0\0\0\0\1\0\0\0\0\0\0\0\200\25\4e\0(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0%\0\0\0007\0\0\0\10\0\0\0\t\0\0\0\303\24\204\20 \2J\1\2\fP\0\0\22\200\23\0\0 \0\0\0\0\1\322\0\1\200-\30\10\0\210\fLL\2 \10\200K\0\204\0\4B \0D\1\0\22@ \3\211 8@\1\21\6\22\t7\0\0\0\0\0\0\0008\0\0\0:\0\0\0<\0\0\0=\0\0\0?\0\0\0\0\0\0\0\0\0\0\0@\0\0\0\0\0\0\0A\0\0\0B\0\0\0D\0\0\0\0\0\0\0G\0\0\0I\0\0\0\0\0\0\0J\0\0\0L\0\0\0N\0\0\0\0\0\0\0O\0\0\0P\0\0\0Q\0\0\0T\0\0\0\0\0\0\0V\0\0\0W\0\0\0Y\0\0\0[\0\0\0]\0\0\0\0\0\0\0`\0\0\0a\0\0\0d\0\0\0g\0\0\0\r\342w4\314A\254\26\33'<\265\300\16${\305X-US\323\376<F\0013\265\353\323\357\16\23\200:\364\3\217\307\3433\322\207\327 =\373ZK\330\332\354|\224\367\332\300\300|\226\307D\203\264@\224\353\2077\256\224YAko\5\n\177S\33\305IF~\342\346\6AwX\234\207\271\252k5\7\230-\354\2353F\306\26\25\333b\304\f\205\351\331qX\34\0008C,\273\343\222|\v\360\201\273$\f\317\263\353Xw[\344X\32\362\365\231\3479BE\325\354", 832) = 832 <0.000013>
01:05:57.055196 fstat(3, {st_mode=S_IFREG|0755, st_size=30520, ...}) = 0 <0.000012>
01:05:57.055351 mmap(NULL, 2138248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa2a80f8000 <0.000016>
01:05:57.055486 mprotect(0x7fa2a80ff000, 2093056, PROT_NONE) = 0 <0.000019>
01:05:57.055618 mmap(0x7fa2a82fe000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7fa2a82fe000 <0.000022>
01:05:57.055769 mmap(0x7fa2a8300000, 8328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa2a8300000 <0.000019>
01:05:57.055922 close(3)                = 0 <0.000012>
01:05:57.056110 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000020>
01:05:57.056261 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\v\2\0\0\0\0\0@\0\0\0\0\0\0\0\350\265\32\0\0\0\0\0\0\0\0\0@\0008\0\v\0@\0D\0C\0\6\0\0\0\5\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0h\2\0\0\0\0\0\0h\2\0\0\0\0\0\0\10\0\0\0\0\0\0\0\3\0\0\0\4\0\0\0\0$\30\0\0\0\0\0\0$\30\0\0\0\0\0\0$\30\0\0\0\0\0\34\0\0\0\0\0\0\0\34\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\1\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30Q\32\0\0\0\0\0\30Q\32\0\0\0\0\0\0\0 \0\0\0\0\0\1\0\0\0\6\0\0\0\240U\32\0\0\0\0\0\240U:\0\0\0\0\0\240U:\0\0\0\0\0 R\0\0\0\0\0\0\340\236\0\0\0\0\0\0\0\0 \0\0\0\0\0\2\0\0\0\6\0\0\0 \213\32\0\0\0\0\0 \213:\0\0\0\0\0 \213:\0\0\0\0\0\340\1\0\0\0\0\0\0\340\1\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\7\0\0\0\4\0\0\0\240U\32\0\0\0\0\0\240U:\0\0\0\0\0\240U:\0\0\0\0\0\20\0\0\0\0\0\0\0\200\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0\34$\30\0\0\0\0\0\34$\30\0\0\0\0\0\34$\30\0\0\0\0\0\364W\0\0\0\0\0\0\364W\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\240U\32\0\0\0\0\0\240U:\0\0\0\0\0\240U:\0\0\0\0\0`:\0\0\0\0\0\0`:\0\0\0\0\0\0\1\0\0\0\0\0\0\0\200\25\4e\0(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0 \0\0\0\363\3\0\0\n\0\0\0\0\1\0\0\16\0\0\0\0000\20D\240 \2\1\210\3\346\220\305E\214\0\300\0\10\0\5\200\0`\300\200\0\r\212\f\0\4\20\0\210D2\10.@\210P<, \0162H&\204\300\214\4\10\0\2\2\16\241\254\32\4f\300\0\3002\0\300\0P\1 \201\10\204\v  ($\0\4 P\0\20X\200\312DB(\0\6\200\20\30B\0 @\200\0", 832) = 832 <0.000013>
01:05:57.056516 fstat(3, {st_mode=S_IFREG|0755, st_size=1754856, ...}) = 0 <0.000011>
01:05:57.056662 mmap(NULL, 3863680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa2a7d48000 <0.000016>
01:05:57.056794 mprotect(0x7fa2a7eee000, 2093056, PROT_NONE) = 0 <0.000020>
01:05:57.056927 mmap(0x7fa2a80ed000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a5000) = 0x7fa2a80ed000 <0.000022>
01:05:57.057124 mmap(0x7fa2a80f3000, 17536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa2a80f3000 <0.000020>
01:05:57.057282 close(3)                = 0 <0.000011>
01:05:57.057423 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa2a851c000 <0.000015>
01:05:57.057567 open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 <0.000020>
01:05:57.057708 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 a\0\0\0\0\0\0@\0\0\0\0\0\0\0\360\202\1\0\0\0\0\0\0\0\0\0@\0008\0\10\0@\0!\0 \0\1\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\202r\1\0\0\0\0\0\202r\1\0\0\0\0\0\0\0 \0\0\0\0\0\1\0\0\0\6\0\0\0\10y\1\0\0\0\0\0\10y!\0\0\0\0\0\10y!\0\0\0\0\0H\7\0\0\0\0\0\0\210I\0\0\0\0\0\0\0\0 \0\0\0\0\0\2\0\0\0\6\0\0\0\360z\1\0\0\0\0\0\360z!\0\0\0\0\0\360z!\0\0\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0\4\0\0\0\0\0\0\0P\345td\4\0\0\0\310;\1\0\0\0\0\0\310;\1\0\0\0\0\0\310;\1\0\0\0\0\0\244\10\0\0\0\0\0\0\244\10\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\10y\1\0\0\0\0\0\10y!\0\0\0\0\0\10y!\0\0\0\0\0\370\6\0\0\0\0\0\0\370\6\0\0\0\0\0\0\1\0\0\0\0\0\0\0\200\25\4e\0(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0 \0\0\0\345\1\0\0X\0\0\0 \0\0\0\v\0\0\0\31#\2\261\1\10\20\2@@a\370\3\10\10\25\200 \0\0\0\0\200\300\321Q\0\0\0\22\353\3020D\0\10\20A\0\2\0\2\f\1\200\v\221\1\330\240\r\240@\230 \244\200\21\n\202-l@g\214V\24\0\224 \200$H\200P(\1\22\f\311B\240\220\22\10\f \2ZdA\245c\4@\n\n\n\0\2009\1(\314D\204\201\300\22\10(\fD\0\0\0\200Q\10\200\35\4B\320\2608A\0\1\0\0\265\0300\0\200`\2\20\"\0\tA\20\1\5\0P(\251\22G(\0\0\202\4\230@\4\0\20\340T\0\2@\2\2\20\3010D\26\200\0\0\0$\4\24\2\0\34\200\243\220\6\0\30\0\10\20 \1\200\0(\6D%\210*\10 \0\20`\220\200\260\0\0\0\1\0\20\0*\f\20\242\201\233\1\1\203\0\2\0\200\20\0\220\4\0\0\21\220$.%(\t@!\0\20\0\202\2D\241\250\10\2X\0\0\0\0\0\0\0[\0\0\0]\0\0\0", 832) = 832 <0.000013>
01:05:57.057962 fstat(3, {st_mode=S_IFREG|0755, st_size=101168, ...}) = 0 <0.000044>
01:05:57.058148 mmap(NULL, 2212496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fa2a7b2b000 <0.000021>
01:05:57.058289 mprotect(0x7fa2a7b43000, 2093056, PROT_NONE) = 0 <0.000025>
01:05:57.058427 mmap(0x7fa2a7d42000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7fa2a7d42000 <0.000023>
01:05:57.058577 mmap(0x7fa2a7d44000, 12944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fa2a7d44000 <0.000020>
01:05:57.058730 close(3)                = 0 <0.000011>
01:05:57.058883 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa2a851b000 <0.000017>
01:05:57.059094 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa2a851a000 <0.000016>
01:05:57.059241 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa2a8519000 <0.000014>
01:05:57.059377 arch_prctl(ARCH_SET_FS, 0x7fa2a851a700) = 0 <0.000012>
01:05:57.059608 mprotect(0x7fa2a80ed000, 16384, PROT_READ) = 0 <0.000021>
01:05:57.059827 mprotect(0x7fa2a7d42000, 4096, PROT_READ) = 0 <0.000018>
01:05:57.059994 mprotect(0x7fa2a82fe000, 4096, PROT_READ) = 0 <0.000018>
01:05:57.060194 mprotect(0x55562ca65000, 4096, PROT_READ) = 0 <0.000016>
01:05:57.060313 mprotect(0x7fa2a8525000, 4096, PROT_READ) = 0 <0.000016>
01:05:57.060422 munmap(0x7fa2a851d000, 25683) = 0 <0.000023>
01:05:57.060540 set_tid_address(0x7fa2a851a9d0) = 5059 <0.000011>
01:05:57.060640 set_robust_list(0x7fa2a851a9e0, 24) = 0 <0.000012>
01:05:57.060749 rt_sigaction(SIGRTMIN, {0x7fa2a7b30b90, [], SA_RESTORER|SA_SIGINFO, 0x7fa2a7b3c0f0}, NULL, 8) = 0 <0.000012>
01:05:57.060894 rt_sigaction(SIGRT_1, {0x7fa2a7b30c30, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7fa2a7b3c0f0}, NULL, 8) = 0 <0.000011>
01:05:57.061074 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 <0.000038>
01:05:57.061195 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 <0.000035>
01:05:57.061362 stat("/var/run/xenstored/socket", 0x7fffe1b0ddc0) = -1 ENOENT (No such file or directory) <0.000025>
01:05:57.061523 stat("/proc/xen/xenbus", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 <0.000042>
01:05:57.061679 open("/proc/xen/xenbus", O_RDWR) = 3 <0.000046>
01:05:57.061856 brk(0)                  = 0x55562d387000 <0.000037>
01:05:57.061962 brk(0x55562d3a8000)     = 0x55562d3a8000 <0.000078>
01:05:57.062141 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7fa2a7d7c7c0}, {SIG_DFL, [], 0}, 8) = 0 <0.000012>
01:05:57.062301 write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0", 16) = 16 <0.000013>
01:05:57.062429 write(3, "name\0", 5)   = 5 <0.000264>
01:05:57.062824 read(3, "\2\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0", 16) = 16 <0.000036>
01:05:57.062940 read(3, "Core1168", 8)  = 8 <0.000034>
01:05:57.063109 rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7fa2a7d7c7c0}, NULL, 8) = 0 <0.000012>
01:05:57.063271 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 <0.000038>
01:05:57.063419 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa2a8523000 <0.000040>
01:05:57.063541 write(1, "Core1168\n", 9Core1168
) = 9 <0.000063>
01:05:57.063690 exit_group(0)           = ?
01:05:57.063851 +++ exited with 0 +++

but, on 1122 the write fails with ESRCH


system # strace -tt -T -s 1024 -f xenstore read name
01:06:26.752812 execve("/usr/bin/xenstore", ["xenstore", "read", "name"], [/* 19 vars */]) = 0 <0.000438>
01:06:26.753565 brk(0)                  = 0x559ea8f17000 <0.000055>
01:06:26.753742 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdc91b45000 <0.000085>
01:06:26.753956 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000070>
01:06:26.754153 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000056>
01:06:26.754306 fstat(3, {st_mode=S_IFREG|0644, st_size=25683, ...}) = 0 <0.000043>
01:06:26.754454 mmap(NULL, 25683, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fdc91b3e000 <0.000045>
01:06:26.754578 close(3)                = 0 <0.000041>
01:06:26.754701 open("/lib64/libxenstore.so.3.0", O_RDONLY|O_CLOEXEC) = 3 <0.000050>
01:06:26.754848 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0`!\0\0\0\0\0\0@\0\0\0\0\0\0\0\370p\0\0\0\0\0\0\0\0\0\0@\0008\0\7\0@\0\31\0\30\0\1\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0$d\0\0\0\0\0\0$d\0\0\0\0\0\0\0\0 \0\0\0\0\0\1\0\0\0\6\0\0\0\230j\0\0\0\0\0\0\230j \0\0\0\0\0\230j \0\0\0\0\0p\5\0\0\0\0\0\0\3605\0\0\0\0\0\0\0\0 \0\0\0\0\0\2\0\0\0\6\0\0\0\260k\0\0\0\0\0\0\260k \0\0\0\0\0\260k \0\0\0\0\0\300\1\0\0\0\0\0\0\300\1\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0xX\0\0\0\0\0\0xX\0\0\0\0\0\0xX\0\0\0\0\0\0\344\1\0\0\0\0\0\0\344\1\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\230j\0\0\0\0\0\0\230j \0\0\0\0\0\230j \0\0\0\0\0h\5\0\0\0\0\0\0h\5\0\0\0\0\0\0\1\0\0\0\0\0\0\0\200\25\4e\0(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0%\0\0\0007\0\0\0\10\0\0\0\t\0\0\0\303\24\204\20 \2J\1\2\fP\0\0\22\200\23\0\0 \0\0\0\0\1\322\0\1\200-\30\10\0\210\fLL\2 \10\200K\0\204\0\4B \0D\1\0\22@ \3\211 8@\1\21\6\22\t7\0\0\0\0\0\0\0008\0\0\0:\0\0\0<\0\0\0=\0\0\0?\0\0\0\0\0\0\0\0\0\0\0@\0\0\0\0\0\0\0A\0\0\0B\0\0\0D\0\0\0\0\0\0\0G\0\0\0I\0\0\0\0\0\0\0J\0\0\0L\0\0\0N\0\0\0\0\0\0\0O\0\0\0P\0\0\0Q\0\0\0T\0\0\0\0\0\0\0V\0\0\0W\0\0\0Y\0\0\0[\0\0\0]\0\0\0\0\0\0\0`\0\0\0a\0\0\0d\0\0\0g\0\0\0\r\342w4\314A\254\26\33'<\265\300\16${\305X-US\323\376<F\0013\265\353\323\357\16\23\200:\364\3\217\307\3433\322\207\327 =\373ZK\330\332\354|\224\367\332\300\300|\226\307D\203\264@\224\353\2077\256\224YAko\5\n\177S\33\305IF~\342\346\6AwX\234\207\271\252k5\7\230-\354\2353F\306\26\25\333b\304\f\205\351\331qX\34\0008C,\273\343\222|\v\360\201\273$\f\317\263\353Xw[\344X\32\362\365\231\3479BE\325\354", 832) = 832 <0.000046>
01:06:26.755106 fstat(3, {st_mode=S_IFREG|0755, st_size=30520, ...}) = 0 <0.000043>
01:06:26.755249 mmap(NULL, 2138248, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fdc91719000 <0.000045>
01:06:26.755372 mprotect(0x7fdc91720000, 2093056, PROT_NONE) = 0 <0.000049>
01:06:26.755495 mmap(0x7fdc9191f000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7fdc9191f000 <0.000050>
01:06:26.755630 mmap(0x7fdc91921000, 8328, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdc91921000 <0.000046>
01:06:26.755762 close(3)                = 0 <0.000056>
01:06:26.755924 open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000055>
01:06:26.756076 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\360\v\2\0\0\0\0\0@\0\0\0\0\0\0\0\350\265\32\0\0\0\0\0\0\0\0\0@\0008\0\v\0@\0D\0C\0\6\0\0\0\5\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0@\0\0\0\0\0\0\0h\2\0\0\0\0\0\0h\2\0\0\0\0\0\0\10\0\0\0\0\0\0\0\3\0\0\0\4\0\0\0\0$\30\0\0\0\0\0\0$\30\0\0\0\0\0\0$\30\0\0\0\0\0\34\0\0\0\0\0\0\0\34\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0\1\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\30Q\32\0\0\0\0\0\30Q\32\0\0\0\0\0\0\0 \0\0\0\0\0\1\0\0\0\6\0\0\0\240U\32\0\0\0\0\0\240U:\0\0\0\0\0\240U:\0\0\0\0\0 R\0\0\0\0\0\0\340\236\0\0\0\0\0\0\0\0 \0\0\0\0\0\2\0\0\0\6\0\0\0 \213\32\0\0\0\0\0 \213:\0\0\0\0\0 \213:\0\0\0\0\0\340\1\0\0\0\0\0\0\340\1\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0\250\2\0\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\7\0\0\0\4\0\0\0\240U\32\0\0\0\0\0\240U:\0\0\0\0\0\240U:\0\0\0\0\0\20\0\0\0\0\0\0\0\200\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0P\345td\4\0\0\0\34$\30\0\0\0\0\0\34$\30\0\0\0\0\0\34$\30\0\0\0\0\0\364W\0\0\0\0\0\0\364W\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\240U\32\0\0\0\0\0\240U:\0\0\0\0\0\240U:\0\0\0\0\0`:\0\0\0\0\0\0`:\0\0\0\0\0\0\1\0\0\0\0\0\0\0\200\25\4e\0(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0 \0\0\0\363\3\0\0\n\0\0\0\0\1\0\0\16\0\0\0\0000\20D\240 \2\1\210\3\346\220\305E\214\0\300\0\10\0\5\200\0`\300\200\0\r\212\f\0\4\20\0\210D2\10.@\210P<, \0162H&\204\300\214\4\10\0\2\2\16\241\254\32\4f\300\0\3002\0\300\0P\1 \201\10\204\v  ($\0\4 P\0\20X\200\312DB(\0\6\200\20\30B\0 @\200\0", 832) = 832 <0.000043>
01:06:26.756319 fstat(3, {st_mode=S_IFREG|0755, st_size=1754856, ...}) = 0 <0.000043>
01:06:26.756469 mmap(NULL, 3863680, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fdc91369000 <0.000046>
01:06:26.756592 mprotect(0x7fdc9150f000, 2093056, PROT_NONE) = 0 <0.000049>
01:06:26.756714 mmap(0x7fdc9170e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1a5000) = 0x7fdc9170e000 <0.000051>
01:06:26.756879 mmap(0x7fdc91714000, 17536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdc91714000 <0.000049>
01:06:26.757026 close(3)                = 0 <0.000043>
01:06:26.757155 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdc91b3d000 <0.000044>
01:06:26.757280 open("/lib64/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 3 <0.000054>
01:06:26.757414 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 a\0\0\0\0\0\0@\0\0\0\0\0\0\0\360\202\1\0\0\0\0\0\0\0\0\0@\0008\0\10\0@\0!\0 \0\1\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\202r\1\0\0\0\0\0\202r\1\0\0\0\0\0\0\0 \0\0\0\0\0\1\0\0\0\6\0\0\0\10y\1\0\0\0\0\0\10y!\0\0\0\0\0\10y!\0\0\0\0\0H\7\0\0\0\0\0\0\210I\0\0\0\0\0\0\0\0 \0\0\0\0\0\2\0\0\0\6\0\0\0\360z\1\0\0\0\0\0\360z!\0\0\0\0\0\360z!\0\0\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\4\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\2\0\0\0\0\0\0 \0\0\0\0\0\0\0 \0\0\0\0\0\0\0\4\0\0\0\0\0\0\0P\345td\4\0\0\0\310;\1\0\0\0\0\0\310;\1\0\0\0\0\0\310;\1\0\0\0\0\0\244\10\0\0\0\0\0\0\244\10\0\0\0\0\0\0\4\0\0\0\0\0\0\0Q\345td\6\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\20\0\0\0\0\0\0\0R\345td\4\0\0\0\10y\1\0\0\0\0\0\10y!\0\0\0\0\0\10y!\0\0\0\0\0\370\6\0\0\0\0\0\0\370\6\0\0\0\0\0\0\1\0\0\0\0\0\0\0\200\25\4e\0(\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\10\0\0\0\0\0\0\0\4\0\0\0\20\0\0\0\1\0\0\0GNU\0\0\0\0\0\2\0\0\0\6\0\0\0 \0\0\0\345\1\0\0X\0\0\0 \0\0\0\v\0\0\0\31#\2\261\1\10\20\2@@a\370\3\10\10\25\200 \0\0\0\0\200\300\321Q\0\0\0\22\353\3020D\0\10\20A\0\2\0\2\f\1\200\v\221\1\330\240\r\240@\230 \244\200\21\n\202-l@g\214V\24\0\224 \200$H\200P(\1\22\f\311B\240\220\22\10\f \2ZdA\245c\4@\n\n\n\0\2009\1(\314D\204\201\300\22\10(\fD\0\0\0\200Q\10\200\35\4B\320\2608A\0\1\0\0\265\0300\0\200`\2\20\"\0\tA\20\1\5\0P(\251\22G(\0\0\202\4\230@\4\0\20\340T\0\2@\2\2\20\3010D\26\200\0\0\0$\4\24\2\0\34\200\243\220\6\0\30\0\10\20 \1\200\0(\6D%\210*\10 \0\20`\220\200\260\0\0\0\1\0\20\0*\f\20\242\201\233\1\1\203\0\2\0\200\20\0\220\4\0\0\21\220$.%(\t@!\0\20\0\202\2D\241\250\10\2X\0\0\0\0\0\0\0[\0\0\0]\0\0\0", 832) = 832 <0.000043>
01:06:26.757657 fstat(3, {st_mode=S_IFREG|0755, st_size=101168, ...}) = 0 <0.000052>
01:06:26.757851 mmap(NULL, 2212496, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fdc9114c000 <0.000048>
01:06:26.757985 mprotect(0x7fdc91164000, 2093056, PROT_NONE) = 0 <0.000050>
01:06:26.758110 mmap(0x7fdc91363000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7fdc91363000 <0.000050>
01:06:26.758244 mmap(0x7fdc91365000, 12944, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fdc91365000 <0.000047>
01:06:26.758376 close(3)                = 0 <0.000042>
01:06:26.758520 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdc91b3c000 <0.000045>
01:06:26.758644 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdc91b3b000 <0.000043>
01:06:26.758766 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fdc91b3a000 <0.000056>
01:06:26.758903 arch_prctl(ARCH_SET_FS, 0x7fdc91b3b700) = 0 <0.000043>
01:06:26.759120 mprotect(0x7fdc9170e000, 16384, PROT_READ) = 0 <0.000051>
01:06:26.759297 mprotect(0x7fdc91363000, 4096, PROT_READ) = 0 <0.000047>
01:06:26.759457 mprotect(0x7fdc9191f000, 4096, PROT_READ) = 0 <0.000046>
01:06:26.759596 mprotect(0x559ea84c2000, 4096, PROT_READ) = 0 <0.000046>
01:06:26.759718 mprotect(0x7fdc91b46000, 4096, PROT_READ) = 0 <0.000046>
01:06:26.759860 munmap(0x7fdc91b3e000, 25683) = 0 <0.000064>
01:06:26.760009 set_tid_address(0x7fdc91b3b9d0) = 1571 <0.000042>
01:06:26.760125 set_robust_list(0x7fdc91b3b9e0, 24) = 0 <0.000042>
01:06:26.760247 rt_sigaction(SIGRTMIN, {0x7fdc91151b90, [], SA_RESTORER|SA_SIGINFO, 0x7fdc9115d0f0}, NULL, 8) = 0 <0.000042>
01:06:26.760382 rt_sigaction(SIGRT_1, {0x7fdc91151c30, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7fdc9115d0f0}, NULL, 8) = 0 <0.000041>
01:06:26.760512 rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0 <0.000041>
01:06:26.760639 getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM64_INFINITY}) = 0 <0.000043>
01:06:26.760833 stat("/var/run/xenstored/socket", 0x7ffe7e148890) = -1 ENOENT (No such file or directory) <0.000061>
01:06:26.760984 stat("/proc/xen/xenbus", {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 <0.000050>
01:06:26.761137 open("/proc/xen/xenbus", O_RDWR) = 3 <0.000053>
01:06:26.761323 brk(0)                  = 0x559ea8f17000 <0.000042>
01:06:26.761439 brk(0x559ea8f38000)     = 0x559ea8f38000 <0.000045>
01:06:26.761569 rt_sigaction(SIGPIPE, {SIG_IGN, [], SA_RESTORER, 0x7fdc9139d7c0}, {SIG_DFL, [], 0}, 8) = 0 <0.000042>
01:06:26.761712 write(3, "\2\0\0\0\0\0\0\0\0\0\0\0\5\0\0\0", 16) = 16 <0.000045>
01:06:26.761849 write(3, "name\0", 5)   = -1 ESRCH (No such process) <0.000043>
01:06:26.761974 rt_sigaction(SIGPIPE, {SIG_DFL, [], SA_RESTORER, 0x7fdc9139d7c0}, NULL, 8) = 0 <0.000041>
01:06:26.762103 close(3)                = 0 <0.000044>
01:06:26.762233 write(2, "xenstore: ", 10xenstore: ) = 10 <0.000052>
01:06:26.762365 write(2, "couldn't read path name", 23couldn't read path name) = 23 <0.000051>
01:06:26.762495 write(2, "\n", 1
)       = 1 <0.000050>
01:06:26.762635 exit_group(1)           = ?
01:06:26.762795 +++ exited with 1 ++
hcoyote commented 7 years ago

note: I do wonder if it's related to this change in the kernel between 4.6 and 4.7 (if I'm reading this diff and versions correctly) ... this is the only thing I could find that changed on the xen/xenbus side of things.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/diff/drivers/xen/xenbus/?h=linux-4.6.y&id=v4.7.4&id2=v4.7.3&context=3&ignorews=0&dt=0

Zhengchai commented 7 years ago

Looks like CoreOS needs kernel 4.7.4 to fix this?

crawford commented 7 years ago

@Zhengchai do you know which commit fixes the issue?

hcoyote commented 7 years ago

This one looks like it might be it?

This should really only be done for XS_TRANSACTION_END messages, or else at least some of the xenstore-* tools don't work anymore.

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=b035f15daf750e1594964d68889e0de5c3014922

porlock commented 7 years ago

This issue is also present in beta 1185 so small chances to getting better in next release :/ CoreOS devs come on why nobody is taking care about this issue ?

crawford commented 7 years ago

@hcoyote Okay, I see that that was included in the 4.7.4 kernel. We'll go ahead and bump the kernel before the next Alpha. Thanks for digging into that.

crawford commented 7 years ago

Fixed by https://github.com/coreos/coreos-overlay/pull/2237 (updating to 4.8.2).

mliradelc commented 7 years ago

How I can apply this patch in the stable version?

crawford commented 7 years ago

@maxtrix It would be much easier to just pick up the Alpha or Beta channel for the time being. Using this newer kernel with the older Docker, for example, may introduce problems since this combination hasn't been tested. You are definitely welcome to try though. You'll need to build an SDK following these instructions, apply this patch to the overlay, and then build the OS. Keep in mind that if you do this, you'll no longer get official updates since this image will be unofficial.

rjt commented 7 years ago

$ sudo strace -tt -T -s 1024 -f xenstore read name 2>&1 | egrep '(EN|ESRCH|DENIED|ERR|WARN|FAIL)' | egrep -v DENYWRITE 02:06:45.103267 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000016> 02:06:45.107512 stat("/var/run/xenstored/socket", 0x7fffc1fd3bf0) = -1 ENOENT (No such file or directory) <0.000020> 02:06:45.108122 write(3, "name\0", 5) = -1 ESRCH (No such process) <0.000012>

So is this still expected behavior since 4.8.2 does not seem available to me? $ cat /etc/os-release NAME="Container Linux by CoreOS" ID=coreos VERSION=1235.6.0 VERSION_ID=1235.6.0 BUILD_ID=2017-01-10-0545 PRETTY_NAME="Container Linux by CoreOS 1235.6.0 (Ladybug)" ANSI_COLOR="38;5;75" HOME_URL="https://coreos.com/" BUG_REPORT_URL="https://github.com/coreos/bugs/issues"

# cat /proc/version && uname -a Linux version 4.7.3-coreos-r2 (jenkins@localhost) (gcc version 4.9.3 (Gentoo Hardened 4.9.3 p1.5, pie-0.6.4) ) #1 SMP Sun Jan 8 00:32:25 UTC 2017 Linux core01 4.7.3-coreos-r2 #1 SMP Sun Jan 8 00:32:25 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5540 @ 2.53GHz GenuineIntel GNU/Linux

crawford commented 7 years ago

@rjt can you file a new issue so we can track this?