cloudfoundry / bosh-aws-cpi-release

BOSH AWS CPI
Apache License 2.0
29 stars 62 forks source link

`Failed to get device real path by disk ID` when in registry-less mode #95

Closed voelzmo closed 5 years ago

voelzmo commented 5 years ago

The agent on a VM errors out with this

2019-01-31_13:26:54.82285 [ComplexSettingsSource] 2019/01/31 13:26:54 DEBUG - Got settings from metadata service, not contacting registry.
2019-01-31_13:26:54.82286 [settingsService] 2019/01/31 13:26:54 DEBUG - Successfully received settings from fetcher
2019-01-31_13:26:54.82286 [File System] 2019/01/31 13:26:54 DEBUG - Making dir /var/vcap/bosh with perm 0777
2019-01-31_13:26:54.82291 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Successful: true (0)
2019-01-31_13:26:54.82300 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Running command 'route -n'
2019-01-31_13:26:54.82322 [DelayedAuditLogger] 2019/01/31 13:26:54 DEBUG - Logging CEF:0|CloudFoundry|BOSH|1|agent_api|get_task|1|duser=director.1965ebbd-fb72-40f3-9f4d-8c2be6440248.9e125403-3496-42f6-9939-6fb72a4e66ac.2afc591d-1280-4d58-8300-275f2cfb461d src=10.0.3.12 spt=4222 shost=9e125403-3496-42f6-9939-6fb72a4e66ac  to syslog
2019-01-31_13:26:54.82399 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Successful: true (0)
2019-01-31_13:26:54.82401 [File System] 2019/01/31 13:26:54 DEBUG - Checking if file exists /var/vcap/bosh/persistent_disk_hints.json
2019-01-31_13:26:54.82401 [File System] 2019/01/31 13:26:54 DEBUG - Stat '/var/vcap/bosh/persistent_disk_hints.json'
2019-01-31_13:26:54.82401 [linuxPlatform] 2019/01/31 13:26:54 DEBUG - Checking whether persistent disk {ID:vol-021c750b792d97875 DeviceID: VolumeID:/dev/sdf Lun: HostDeviceID: Path:/dev/sdf ISCSISettings:{InitiatorName: Username: Target: Password:} FileSystemType: MountOptions:[] Partitioner:} is mounted
2019-01-31_13:26:54.82402 [ConcreteUdevDevice] 2019/01/31 13:26:54 DEBUG - Triggering UdevDevice
2019-01-31_13:26:54.82420 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Running command 'udevadm trigger'
2019-01-31_13:26:54.88282 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Stdout:
2019-01-31_13:26:54.88284 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Stderr:
2019-01-31_13:26:54.88284 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Successful: true (0)
2019-01-31_13:26:54.88285 [ConcreteUdevDevice] 2019/01/31 13:26:54 DEBUG - Settling UdevDevice
2019-01-31_13:26:54.88304 [Cmd Runner] 2019/01/31 13:26:54 DEBUG - Running command 'udevadm settle'
2019-01-31_13:26:55.82367 [Action Dispatcher] 2019/01/31 13:26:55 INFO - Received request with action get_task
2019-01-31_13:26:55.82369 [Action Dispatcher] 2019/01/31 13:26:55 DEBUG - Payload
2019-01-31_13:26:55.82369 ********************
2019-01-31_13:26:55.82369 {"protocol":3,"method":"get_task","arguments":["77640839-24f3-4221-67af-f9fb97d4df7a"],"reply_to":"director.1965ebbd-fb72-40f3-9f4d-8c2be6440248.9e125403-3496-42f6-9939-6fb72a4e66ac.3d5ce266-7af5-4b82-90ff-b2e1bd060006"}
2019-01-31_13:26:55.82370 ********************
2019-01-31_13:26:55.82370 [Action Dispatcher] 2019/01/31 13:26:55 INFO - Running sync action get_task
2019-01-31_13:26:55.82370 [MBus Handler] 2019/01/31 13:26:55 INFO - Responding
2019-01-31_13:26:55.82371 [MBus Handler] 2019/01/31 13:26:55 DEBUG - Payload
2019-01-31_13:26:55.82371 ********************
2019-01-31_13:26:55.82371 {"value":{"agent_task_id":"77640839-24f3-4221-67af-f9fb97d4df7a","state":"running"}}
2019-01-31_13:26:55.82371 ********************
2019-01-31_13:26:55.82412 [Cmd Runner] 2019/01/31 13:26:55 DEBUG - Running command 'route -n'
2019-01-31_13:26:55.82457 [Cmd Runner] 2019/01/31 13:26:55 DEBUG - Successful: true (0)
2019-01-31_13:26:55.82470 [DelayedAuditLogger] 2019/01/31 13:26:55 DEBUG - Logging CEF:0|CloudFoundry|BOSH|1|agent_api|get_task|1|duser=director.1965ebbd-fb72-40f3-9f4d-8c2be6440248.9e125403-3496-42f6-9939-6fb72a4e66ac.3d5ce266-7af5-4b82-90ff-b2e1bd060006 src=10.0.3.12 spt=4222 shost=9e125403-3496-42f6-9939-6fb72a4e66ac  to syslog
2019-01-31_13:26:56.82539 [Action Dispatcher] 2019/01/31 13:26:56 INFO - Received request with action get_task
2019-01-31_13:26:56.82541 [Action Dispatcher] 2019/01/31 13:26:56 DEBUG - Payload
2019-01-31_13:26:56.82541 ********************
2019-01-31_13:26:56.82542 {"protocol":3,"method":"get_task","arguments":["77640839-24f3-4221-67af-f9fb97d4df7a"],"reply_to":"director.1965ebbd-fb72-40f3-9f4d-8c2be6440248.9e125403-3496-42f6-9939-6fb72a4e66ac.d1d7dc05-722a-4939-92fb-87b3be8bad51"}
2019-01-31_13:26:56.82542 ********************
2019-01-31_13:26:56.82542 [Action Dispatcher] 2019/01/31 13:26:56 INFO - Running sync action get_task
2019-01-31_13:26:56.82543 [MBus Handler] 2019/01/31 13:26:56 INFO - Responding
2019-01-31_13:26:56.82543 [MBus Handler] 2019/01/31 13:26:56 DEBUG - Payload
2019-01-31_13:26:56.82543 ********************
2019-01-31_13:26:56.82544 {"value":{"agent_task_id":"77640839-24f3-4221-67af-f9fb97d4df7a","state":"running"}}
2019-01-31_13:26:56.82544 ********************
2019-01-31_13:26:56.82581 [Cmd Runner] 2019/01/31 13:26:56 DEBUG - Running command 'route -n'
2019-01-31_13:26:56.82630 [Cmd Runner] 2019/01/31 13:26:56 DEBUG - Successful: true (0)
2019-01-31_13:26:56.82643 [DelayedAuditLogger] 2019/01/31 13:26:56 DEBUG - Logging CEF:0|CloudFoundry|BOSH|1|agent_api|get_task|1|duser=director.1965ebbd-fb72-40f3-9f4d-8c2be6440248.9e125403-3496-42f6-9939-6fb72a4e66ac.d1d7dc05-722a-4939-92fb-87b3be8bad51 src=10.0.3.12 spt=4222 shost=9e125403-3496-42f6-9939-6fb72a4e66ac  to syslog
2019-01-31_13:26:57.82686 [Action Dispatcher] 2019/01/31 13:26:57 INFO - Received request with action get_task
2019-01-31_13:26:57.82687 [Action Dispatcher] 2019/01/31 13:26:57 DEBUG - Payload
2019-01-31_13:26:57.82688 ********************
2019-01-31_13:26:57.82688 {"protocol":3,"method":"get_task","arguments":["77640839-24f3-4221-67af-f9fb97d4df7a"],"reply_to":"director.1965ebbd-fb72-40f3-9f4d-8c2be6440248.9e125403-3496-42f6-9939-6fb72a4e66ac.e37f357f-373b-460f-b281-522c9148c792"}
2019-01-31_13:26:57.82688 ********************
2019-01-31_13:26:57.82689 [Action Dispatcher] 2019/01/31 13:26:57 INFO - Running sync action get_task
2019-01-31_13:26:57.82689 [MBus Handler] 2019/01/31 13:26:57 INFO - Responding
2019-01-31_13:26:57.82689 [MBus Handler] 2019/01/31 13:26:57 DEBUG - Payload
2019-01-31_13:26:57.82690 ********************
2019-01-31_13:26:57.82690 {"value":{"agent_task_id":"77640839-24f3-4221-67af-f9fb97d4df7a","state":"running"}}
2019-01-31_13:26:57.82690 ********************
2019-01-31_13:26:57.82728 [Cmd Runner] 2019/01/31 13:26:57 DEBUG - Running command 'route -n'
2019-01-31_13:26:57.82775 [Cmd Runner] 2019/01/31 13:26:57 DEBUG - Successful: true (0)
2019-01-31_13:26:57.82787 [DelayedAuditLogger] 2019/01/31 13:26:57 DEBUG - Logging CEF:0|CloudFoundry|BOSH|1|agent_api|get_task|1|duser=director.1965ebbd-fb72-40f3-9f4d-8c2be6440248.9e125403-3496-42f6-9939-6fb72a4e66ac.e37f357f-373b-460f-b281-522c9148c792 src=10.0.3.12 spt=4222 shost=9e125403-3496-42f6-9939-6fb72a4e66ac  to syslog
2019-01-31_13:26:58.05381 [Cmd Runner] 2019/01/31 13:26:58 DEBUG - Stdout:
2019-01-31_13:26:58.05383 [Cmd Runner] 2019/01/31 13:26:58 DEBUG - Stderr:
2019-01-31_13:26:58.05383 [Cmd Runner] 2019/01/31 13:26:58 DEBUG - Successful: true (0)
2019-01-31_13:26:58.15408 [File System] 2019/01/31 13:26:58 DEBUG - Glob '/dev/disk/by-id/*vol-021c750b792d9787'
2019-01-31_13:26:58.25428 [File System] 2019/01/31 13:26:58 DEBUG - Glob '/dev/disk/by-id/*vol-021c750b792d9787'
2019-01-31_13:26:58.35459 [File System] 2019/01/31 13:26:58 DEBUG - Glob '/dev/disk/by-id/*vol-021c750b792d9787'
2019-01-31_13:26:58.45481 [File System] 2019/01/31 13:26:58 DEBUG - Glob '/dev/disk/by-id/*vol-021c750b792d9787'
2019-01-31_13:26:58.55530 [File System] 2019/01/31 13:26:58 DEBUG - Glob '/dev/disk/by-id/*vol-021c750b792d9787'
2019-01-31_13:26:58.55531 [virtioDevicePathResolver] 2019/01/31 13:26:58 DEBUG - Failed to get device real path by disk ID: 'vol-021c750b792d97875'. Error: 'Timed out getting real device path for 'vol-021c750b792d9787'', timeout: 'true'
2019-01-31_13:26:58.55531 [virtioDevicePathResolver] 2019/01/31 13:26:58 DEBUG - Using mapped resolver to get device real path
2019-01-31_13:26:58.55532 [File System] 2019/01/31 13:26:58 DEBUG - Checking if file exists /dev/xvdf
2019-01-31_13:26:58.55532 [File System] 2019/01/31 13:26:58 DEBUG - Stat '/dev/xvdf'
2019-01-31_13:26:58.55532 [File System] 2019/01/31 13:26:58 DEBUG - Reading file /proc/mounts
2019-01-31_13:26:58.55533 [File System] 2019/01/31 13:26:58 DEBUG - Read content
2019-01-31_13:26:58.55533 ********************
2019-01-31_13:26:58.55533 sysfs /sys sysfs rw,nosuid,nodev,noexec,relatime 0 0
2019-01-31_13:26:58.55533 proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
2019-01-31_13:26:58.55534 devtmpfs /dev devtmpfs rw,nosuid,size=62892300k,nr_inodes=15723075,mode=755 0 0
2019-01-31_13:26:58.55534 securityfs /sys/kernel/security securityfs rw,nosuid,nodev,noexec,relatime 0 0
2019-01-31_13:26:58.55534 tmpfs /dev/shm tmpfs rw,nosuid,nodev,noexec 0 0
2019-01-31_13:26:58.55535 devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
2019-01-31_13:26:58.55536 tmpfs /run tmpfs rw,nosuid,nodev,mode=755 0 0
2019-01-31_13:26:58.55536 tmpfs /run/lock tmpfs rw,nosuid,nodev,noexec,relatime,size=5120k 0 0
2019-01-31_13:26:58.55537 tmpfs /sys/fs/cgroup tmpfs ro,nosuid,nodev,noexec,mode=755 0 0
2019-01-31_13:26:58.55537 cgroup /sys/fs/cgroup/systemd cgroup rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd 0 0
2019-01-31_13:26:58.55538 pstore /sys/fs/pstore pstore rw,nosuid,nodev,noexec,relatime 0 0
2019-01-31_13:26:58.55538 cgroup /sys/fs/cgroup/rdma cgroup rw,nosuid,nodev,noexec,relatime,rdma 0 0
2019-01-31_13:26:58.55538 cgroup /sys/fs/cgroup/pids cgroup rw,nosuid,nodev,noexec,relatime,pids 0 0
2019-01-31_13:26:58.55538 cgroup /sys/fs/cgroup/net_cls,net_prio cgroup rw,nosuid,nodev,noexec,relatime,net_cls,net_prio 0 0
2019-01-31_13:26:58.55539 cgroup /sys/fs/cgroup/cpu,cpuacct cgroup rw,nosuid,nodev,noexec,relatime,cpu,cpuacct 0 0
2019-01-31_13:26:58.55539 cgroup /sys/fs/cgroup/perf_event cgroup rw,nosuid,nodev,noexec,relatime,perf_event 0 0
2019-01-31_13:26:58.55539 cgroup /sys/fs/cgroup/devices cgroup rw,nosuid,nodev,noexec,relatime,devices 0 0
2019-01-31_13:26:58.55540 cgroup /sys/fs/cgroup/blkio cgroup rw,nosuid,nodev,noexec,relatime,blkio 0 0
2019-01-31_13:26:58.55540 cgroup /sys/fs/cgroup/memory cgroup rw,nosuid,nodev,noexec,relatime,memory 0 0
2019-01-31_13:26:58.55541 cgroup /sys/fs/cgroup/freezer cgroup rw,nosuid,nodev,noexec,relatime,freezer 0 0
2019-01-31_13:26:58.55541 cgroup /sys/fs/cgroup/hugetlb cgroup rw,nosuid,nodev,noexec,relatime,hugetlb 0 0
2019-01-31_13:26:58.55542 cgroup /sys/fs/cgroup/cpuset cgroup rw,nosuid,nodev,noexec,relatime,cpuset 0 0
2019-01-31_13:26:58.55542 configfs /sys/kernel/config configfs rw,relatime 0 0
2019-01-31_13:26:58.55542 /dev/xvda1 / ext4 rw,relatime,data=ordered 0 0
2019-01-31_13:26:58.55542 systemd-1 /proc/sys/fs/binfmt_misc autofs rw,relatime,fd=35,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=24296 0 0
2019-01-31_13:26:58.55543 mqueue /dev/mqueue mqueue rw,relatime 0 0
2019-01-31_13:26:58.55543 debugfs /sys/kernel/debug debugfs rw,relatime 0 0
2019-01-31_13:26:58.55543 fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
2019-01-31_13:26:58.55544 hugetlbfs /dev/hugepages hugetlbfs rw,relatime,pagesize=2M 0 0
2019-01-31_13:26:58.55544 /dev/xvdb2 /var/vcap/data ext4 rw,relatime,data=ordered 0 0
2019-01-31_13:26:58.55544 /dev/xvdb2 /var/log ext4 rw,nosuid,nodev,noexec,relatime,data=ordered 0 0
2019-01-31_13:26:58.55544 tmpfs /var/vcap/data/sys/run tmpfs rw,relatime,size=1024k 0 0
2019-01-31_13:26:58.55545 /dev/xvdb2 /tmp ext4 rw,nosuid,nodev,noexec,relatime,data=ordered 0 0
2019-01-31_13:26:58.55545 /dev/xvdb2 /var/tmp ext4 rw,nosuid,nodev,noexec,relatime,data=ordered 0 0
2019-01-31_13:26:58.55546 /dev/xvda1 /home ext4 rw,nodev,relatime,data=ordered 0 0
2019-01-31_13:26:58.55546 /dev/xvdf1 /var/vcap/store ext4 rw,relatime,data=ordered 0 0
2019-01-31_13:26:58.55547
2019-01-31_13:26:58.55547 ********************

It seems that /dev/disk/by-id/ doesn't exist:

# ls -la /dev/disk/by-
by-partlabel/ by-partuuid/  by-uuid/

The persistent disk seems already to be mounted at /var/vcap/store, though, see agent output above. The VM was resurrected at some point and is exposing this behavior since. Initial deploy worked fine and did not show any issues.

Instance type is r4.4xlarge on AWS, CPI is v73, stemcell is v170.24

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/163622067

The labels on this github issue will be updated when the story is started.

voelzmo commented 5 years ago

seems like this message is a red herring: looks like an error, but everything is actually fine. Agent unresponsiveness was caused by something else in our system. Closing.