cloudfoundry / bosh-linux-stemcell-builder

BOSH Ubuntu Linux stemcells
Apache License 2.0
41 stars 103 forks source link

Failed to run 'sfdisk -d /dev/xvdc' when bosh-agent partitions ephemeral disk #28

Closed edwardstudy closed 3 years ago

edwardstudy commented 6 years ago

Hi, we were trying to build SoftLayer xenial stemcell with this based OS image: https://github.com/cloudfoundry/bosh-linux-stemcell-builder/blob/master/bosh-stemcell/os_image_versions.json#L4. It failed during compiler vm creation:

02:11:30 | Preparing deployment: Preparing deployment (00:00:00)
02:11:30 | Preparing package compilation: Finding packages to compile (00:00:00)
02:11:30 | Compiling packages: batlight/dd894d36ab4d50ec504c09be6715c04f4077afee (0)
02:26:50 | Creating missing vms: mutable/ad72b6a1-a1bd-4546-9ba3-1781595aedef (0) (00:15:20)
            L Error: Timed out pinging to e4526a3f-9639-4b9d-a24c-51fdb45c01d2 after 600 seconds

02:26:51 | Error: Timed out pinging to e4526a3f-9639-4b9d-a24c-51fdb45c01d2 after 600 seconds

This error was caused due to the bosh-agent(v2.65.0) failed to partition ephemeral disk in the compilation VM: [main] 2018/02/08 02:43:01 ERROR - Agent exited with error: Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk '/dev/xvdc': Getting partitions for /dev/xvdc: Shelling out to sfdisk when getting partitions: Running command: 'sfdisk -d /dev/xvdc', stdout: '', stderr: 'sfdisk: failed to dump partition table: Success

And 'sfdisk -l' result seems normal:

/:/var/vcap/bosh# sfdisk -l /dev/xvdc
Disk /dev/xvdc: 100 GiB, 107374182400 bytes, 209715200 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Hope detailed logs help you:

2018-02-08_02:42:49.10993 ********************
2018-02-08_02:42:49.11021 [settingsService] 2018/02/08 02:42:49 DEBUG - Loading settings from fetcher
2018-02-08_02:42:49.11023 [File System] 2018/02/08 02:42:49 DEBUG - Reading file
2018-02-08_02:42:49.11023 [registryProvider] 2018/02/08 02:42:49 DEBUG - Using file registry at /var/vcap/bosh/user_data.json
2018-02-08_02:42:49.11023 [File System] 2018/02/08 02:42:49 DEBUG - Reading file /var/vcap/bosh/user_data.json
2018-02-08_02:42:49.11024 [File System] 2018/02/08 02:42:49 DEBUG - Read content
2018-02-08_02:42:49.11024 ********************
2018-02-08_02:42:49.11024 {"agent_id":"91787558-2d2b-4d64-a99e-7b1736f75cf9","vm":{"name":"vm-91787558-2d2b-4d64-a99e-7b1736f75cf9","id":"vm-91787558-2d2b-4d64-a99e-7b1736f75cf9"},"mbus":"nats://nats:mer0itet8msechlek4sl@10.112.116.16:4222","ntp":["time1.google.com","time2.google.com","time3.google.com","time4.google.com"],"blobstore":{"provider":"dav","options":{"endpoint":"http://10.112.116.16:25250","password":"sp4uvvhgop3nrv9387nc","user":"agent"}},"networks":{"default":{"type":"dynamic","dns":["10.112.166.140"],"default":["dns","gateway"],"preconfigured":true,"cloud_properties":{"PrimaryBackendNetworkComponent":{"NetworkVlan":{"Id":1.292651e+06}},"PrimaryNetworkComponent":{"NetworkVlan":{"Id":1.292653e+06}}}}},"disks":{"ephemeral":"/dev/xvdc","persistent":null},"env":{"bosh":{"group":"bats-director-bat-compilation-36bfae6b-a4fd-4445-a51a-9e384c850217","groups":["bats-director","bat","compilation-36bfae6b-a4fd-4445-a51a-9e384c850217","bats-director-bat","bat-compilation-36bfae6b-a4fd-4445-a51a-9e384c850217","bats-director-bat-compilation-36bfae6b-a4fd-4445-a51a-9e384c850217"],"keep_root_password":true,"password":"$6$4cbef42d659bd6ea$6EvArRU9G9haaLC.1UTbNMyP7JNWxoddj6jVgGsF8fglPPUWGv1jJ2lteG3mmdoe6EA6W5m0lfagvICYn0zF8/"}}}
2018-02-08_02:42:49.11026 ********************
2018-02-08_02:42:49.11066 [settingsService] 2018/02/08 02:42:49 DEBUG - Successfully received settings from fetcher
2018-02-08_02:42:49.11099 [File System] 2018/02/08 02:42:49 DEBUG - Making dir /var/vcap/bosh with perm 0777
2018-02-08_02:42:49.11100 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'route -n'
2018-02-08_02:42:49.11194 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Successful: true (0)
2018-02-08_02:42:49.11250 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'usermod -p $6$4cbef42d659bd6ea$6EvArRU9G9haaLC.1UTbNMyP7JNWxoddj6jVgGsF8fglPPUWGv1jJ2lteG3mmdoe6EA6W5m0lfagvICYn0zF8/ vcap'
2018-02-08_02:42:49.13426 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stdout:
2018-02-08_02:42:49.13428 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stderr:
2018-02-08_02:42:49.13428 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Successful: true (0)
2018-02-08_02:42:49.13453 [File System] 2018/02/08 02:42:49 DEBUG - Writing /etc/resolvconf/resolv.conf.d/base
2018-02-08_02:42:49.13454 [File System] 2018/02/08 02:42:49 DEBUG - Making dir /etc/resolvconf/resolv.conf.d with perm 0777
2018-02-08_02:42:49.13454 [File System] 2018/02/08 02:42:49 DEBUG - Write content
2018-02-08_02:42:49.13455 ********************
2018-02-08_02:42:49.13455 # Generated by bosh-agent
2018-02-08_02:42:49.13455 nameserver 10.112.166.140
2018-02-08_02:42:49.13456
2018-02-08_02:42:49.13456 ********************
2018-02-08_02:42:49.13456 [File System] 2018/02/08 02:42:49 DEBUG - Symlinking oldPath /run/resolvconf/resolv.conf with newPath /etc/resolv.conf
2018-02-08_02:42:49.13456 [File System] 2018/02/08 02:42:49 DEBUG - Lstat '/etc/resolv.conf'
2018-02-08_02:42:49.13500 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'resolvconf -u'
2018-02-08_02:42:49.14989 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stdout:
2018-02-08_02:42:49.14991 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Stderr:
2018-02-08_02:42:49.14991 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Successful: true (0)
2018-02-08_02:42:49.14991 [File System] 2018/02/08 02:42:49 DEBUG - Writing /var/vcap/bosh/etc/ntpserver
2018-02-08_02:42:49.15016 [File System] 2018/02/08 02:42:49 DEBUG - Making dir /var/vcap/bosh/etc with perm 0777
2018-02-08_02:42:49.15017 [File System] 2018/02/08 02:42:49 DEBUG - Write content
2018-02-08_02:42:49.15018 ********************
2018-02-08_02:42:49.15019 time1.google.com time2.google.com time3.google.com time4.google.com
2018-02-08_02:42:49.15019 ********************
2018-02-08_02:42:49.15019 [Cmd Runner] 2018/02/08 02:42:49 DEBUG - Running command 'sync-time'
2018-02-08_02:43:01.21297 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stdout:
2018-02-08_02:43:01.21300 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stderr:
2018-02-08_02:43:01.21300 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Successful: true (0)
2018-02-08_02:43:01.21300 [linuxPlatform] 2018/02/08 02:43:01 INFO - Setting up raw ephemeral disks
2018-02-08_02:43:01.21301 [linuxPlatform] 2018/02/08 02:43:01 INFO - Setting up ephemeral disk...
2018-02-08_02:43:01.21301 [File System] 2018/02/08 02:43:01 DEBUG - Glob '/var/vcap/data/*'
2018-02-08_02:43:01.21301 [File System] 2018/02/08 02:43:01 DEBUG - Making dir /var/vcap/data with perm 0750
2018-02-08_02:43:01.21302 [linuxPlatform] 2018/02/08 02:43:01 INFO - Creating swap & ephemeral partitions on ephemeral disk...
2018-02-08_02:43:01.21302 [linuxPlatform] 2018/02/08 02:43:01 DEBUG - Getting device size of `/dev/xvdc'
2018-02-08_02:43:01.21303 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Running command 'sfdisk -s /dev/xvdc'
2018-02-08_02:43:01.21453 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stdout: 104857600
2018-02-08_02:43:01.21455 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stderr:
2018-02-08_02:43:01.21456 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Successful: true (0)
2018-02-08_02:43:01.21456 [linuxPlatform] 2018/02/08 02:43:01 DEBUG - Calculating partition sizes of `/dev/xvdc', with available size 107374182400B
2018-02-08_02:43:01.21558 [linuxPlatform] 2018/02/08 02:43:01 INFO - Partitioning `/dev/xvdc' with [[Type: swap, SizeInBytes: 8368222208] [Type: linux, SizeInBytes: 99005960192]]
2018-02-08_02:43:01.21559 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Running command 'sfdisk -d /dev/xvdc'
2018-02-08_02:43:01.22030 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stdout:
2018-02-08_02:43:01.22031 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Stderr: sfdisk: failed to dump partition table: Success
2018-02-08_02:43:01.22032 [Cmd Runner] 2018/02/08 02:43:01 DEBUG - Successful: false (1)
2018-02-08_02:43:01.22033 [main] 2018/02/08 02:43:01 ERROR - App setup Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk `/dev/xvdc': Getting partitions for /dev/xvdc: Shelling out to sfdisk when getting partitions: Running command: 'sfdisk -d /dev/xvdc', stdout: '', stderr: 'sfdisk: failed to dump partition table: Success
2018-02-08_02:43:01.22034 ': exit status 1
2018-02-08_02:43:01.22035 [main] 2018/02/08 02:43:01 ERROR - Agent exited with error: Running bootstrap: Setting up ephemeral disk: Partitioning ephemeral disk: Partitioning ephemeral disk `/dev/xvdc': Getting partitions for /dev/xvdc: Shelling out to sfdisk when getting partitions: Running command: 'sfdisk -d /dev/xvdc', stdout: '', stderr: 'sfdisk: failed to dump partition table: Success
2018-02-08_02:43:01.22037 ': exit status 1

Thank you so much. :)

tjvman commented 6 years ago

@EdwardStudy Hm, I'm surprised that you're seeing sfdisk being used - these two files should combine to make the partitioner type be parted. Can you confirm that /var/vcap/bosh/agent.json has Platform.Linux.PartitionerType: parted set?

We've seen the above issue before; it's caused by an incompatability between the bosh agent and newer versions of sfdisk. Using parted instead fixes the problem, hence my previous question

edwardstudy commented 6 years ago

@tjvman Your approach worked. But there still had a problem about open-iscsi service which used by softlayer builder stage.

On trusty, open-iscsi is 'Starts and stops the iSCSI initiator services and logs into default targets'. But open-iscsi is 'Login to default iSCSI targets at boot and log out of all iSCSI targets at shutdown' on xenial. However, iscsid is 'The iSCSI initiator daemon takes care of monitoring iSCSI connections to targets. It is also the daemon providing the interface for the iscisadm tool to talk to when administering iSCSI connections.' on xenial.

I just send a bug report: https://bugs.launchpad.net/ubuntu/+source/open-iscsi/+bug/1753353

Could you have any idea about this issue?

ramonskie commented 3 years ago

Closing this issue as there is no activity for a very long time you can reopen this issue if the issue still persists