cloudfoundry / bosh

Cloud Foundry BOSH is an open source tool chain for release engineering, deployment and lifecycle management of large scale distributed services.
https://bosh.io
Apache License 2.0
2.03k stars 658 forks source link

mounting persistent disk fails on aws r5 instance type #2128

Closed langered closed 5 years ago

langered commented 5 years ago

Describe the bug When changing the vm-type of the bosh vm from c4.4xlarge to r5.4xlarge (or 2xlarge) on AWS, bosh is not able to mount the persistent disk. Consequently, the deploy fails and we run into this issue https://github.com/cloudfoundry/bosh/issues/1869

To Reproduce Steps to reproduce the behavior (example):

  1. Deploy a bosh director on AWS with c4 vm_type as director vm.
  2. Change the type to r5.4xlarge or r5.2xlarge
  3. re-deploy the bosh director

Expected behavior The deployment should be successful.

Logs

Task 2622 | 15:29:59 | Preparing deployment: Preparing deployment (00:00:01)
Task 2622 | 15:30:00 | Preparing deployment: Rendering templates (00:00:01)
Task 2622 | 15:30:01 | Preparing package compilation: Finding packages to compile (00:00:00)
Task 2622 | 15:30:01 | Updating instance bosh: bosh/3276dfbe-4b81-4976-ad62-9baf13719d4c (0) (canary) (00:03:05)
                     L Error: Action Failed get_task: Task d529fda0-7cd6-4113-7fae-920cc9ae8c99 result: Mounting persistent disk: Getting real device path: Resolving mapped device path: Timed out getting real device path for /dev/sdf
Task 2622 | 15:33:06 | Error: Action Failed get_task: Task d529fda0-7cd6-4113-7fae-920cc9ae8c99 result: Mounting persistent disk: Getting real device path: Resolving mapped device path: Timed out getting real device path for /dev/sdf
E, [2019-01-30T15:33:06.394734 #1025] [] ERROR -- DirectorJobRunner: Worker thread raised exception: Action Failed get_task: Task d529fda0-7cd6-4113-7fae-920cc9ae8c99 result: Mounting persistent disk: Getting real device path: Resolving mapped device path: Timed out getting real device path for /dev/sdf - /var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb:278:in `handle_method'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb:333:in `handle_message_with_retry'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb:51:in `method_missing'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb:401:in `get_task_status'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb:199:in `wait_for_task'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb:353:in `send_message'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/agent_client.rb:95:in `mount_disk'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/steps/mount_disk_step.rb:18:in `perform'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/steps/mount_instance_disks_step.rb:12:in `block in perform'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/steps/mount_instance_disks_step.rb:10:in `each'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/deployment_plan/steps/mount_instance_disks_step.rb:10:in `perform'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/step_executor.rb:39:in `block in run_agenda'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/step_executor.rb:36:in `each'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/step_executor.rb:36:in `run_agenda'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/step_executor.rb:24:in `block (4 levels) in run'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh_common-0.0.0/lib/common/thread_formatter.rb:52:in `with_thread_name'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh-director-0.0.0/lib/bosh/director/step_executor.rb:18:in `block (3 levels) in run'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh_common-0.0.0/lib/common/thread_pool.rb:77:in `block (2 levels) in create_thread'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh_common-0.0.0/lib/common/thread_pool.rb:63:in `loop'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/bosh_common-0.0.0/lib/common/thread_pool.rb:63:in `block in create_thread'
/var/vcap/data/packages/director/1ae1e56702dd1624c8cf3c0a5d1a9a414c5a89af/gem_home/ruby/2.4.0/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

Versions (please complete the following information):

Additional context on the director vm:

$ lsblk
NAME        MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1     259:0    0  10G  0 disk
├─nvme0n1p1 259:5    0   5G  0 part [SWAP]
└─nvme0n1p2 259:6    0   5G  0 part /var/vcap/data
nvme2n1     259:3    0  69G  0 disk
└─nvme2n1p1 259:4    0  69G  0 part
nvme1n1     259:1    0   3G  0 disk
└─nvme1n1p1 259:2    0   3G  0 part /

It seems that nvme is used as storage type could be related to issue: https://github.com/cloudfoundry/bosh-aws-cpi-release/issues/91

cf-gitbot commented 5 years ago

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/163595513

The labels on this github issue will be updated when the story is started.

dpb587-pivotal commented 5 years ago

Also possibly related to https://github.com/cloudfoundry/bosh-agent/issues/210.

degaurab commented 5 years ago

/cc @charleshansen

Hi @langered Seems like the issue might be fixed on bosh-agent (master) and it might be in the latest 250.x line of stemcells.

reference bosh-agent commit : https://github.com/cloudfoundry/bosh-agent/commit/bc536e3ee481281a992942eb4f24ca173691f0bb

So can you use the 250.x line and let us know if the issue persists.

Thanks

belinda-liu commented 5 years ago

Hi @langered,

Did you get a chance to try out that fix? If there's no other problems, we'd like to close this issue.

Thanks, @belinda-liu && @xtreme-behrouz-soroushian, CF BOSH Team

jdesulme commented 5 years ago

Hey @belinda-liu - I think this is still an issue. However, In my case instead of using the bosh director I had tried to deploy this instance type using a deployment manifest and it failed with a similar error. My original goal was for a T3 instance but I noticed this request as well. So I opened #2135 to investigate since it's a similar type of error.

langered commented 5 years ago

Hey @belinda-liu && @xtreme-behrouz-soroushian,

we did not try the fix yet but we will do anytime soon. We'll provide feedback here as soon as we have tried it out.

Thanks,

@langered

NautiluX commented 5 years ago

Hi @belinda-liu, it seems the issue still exists on https://bosh.io/d/stemcells/bosh-aws-xen-hvm-ubuntu-xenial-go_agent?v=250.9. We reproduced it on AWS using a dummy deployment and migrating it from r4.xlarge to r5.large. cc @KaiHofstetter

mfine30 commented 5 years ago

Just tested this out and it worked successfully using the latest of master of the AWS CPI release repo. We'll need to cut a new AWS CPI to make this more easily usable/available, but I'm going to close this and cut a new release shortly