cloudfoundry-attic / bosh-vcloud-cpi-release

BOSH vCloud CPI
Apache License 2.0
4 stars 11 forks source link

Bosh init VM create error #12

Open Prospecta opened 7 years ago

Prospecta commented 7 years ago

Hi,

I am encountering an error when attempting to run the bosh-init deploy script on vCloud (not vCloud Air).

The job fails when attempting to create the vm:


Command 'deploy' failed: Deploying: Creating instance 'bosh/0': Creating VM: Creating vm with stemcell cid 'urn:vcloud:catalogitem:decd3c65-5dde-47ae-836e-100199cdae0d': CPI 'create_vm' method responded with error: CmdError{"type":"Unknown","message":"Task urn:vcloud:task:14f18678-7d9a-4046-a1aa-d708c8058be0 Updated Virtual Machine 3b6b2f67-572c-44c4-74d4-a80b28887836(be1c153c-6e01-46b3-81c5-018ec2c26c10) completed unsuccessfully, Details: [ 01b88b27-f995-41dd-b39e-c2d498b808fb-3776 ] Unable to perform this action. Contact your cloud admi...","ok_to_retry":false}


and vCloud returns the error:


com.vmware.ssdc.library.exceptions.DatastoreNotAvailableException: null at com.vmware.vcloud.fabric.storage.placement.sdrs.impl.SdrsPlacementManagerImpl.processSdrsResults(SdrsPlacementManagerImpl.java:1040) at com.vmware.vcloud.fabric.storage.placement.sdrs.impl.SdrsPlacementManagerImpl.selectDatastoreInStoragePod(SdrsPlacementManagerImpl.java:213) at com.vmware.vcloud.fabric.storage.placement.impl.VirtualMachineDiskLevelStorageSelectorImpl.selectDatastore(VirtualMachineDiskLevelStorageSelectorImpl.java:369) at com.vmware.vcloud.fabric.storage.storedVm.impl.RelocateStoredVmByStorageClassActivity$CalculateTargetDatastorePhase.invoke(RelocateStoredVmByStorageClassActivity.java:138) at com.vmware.vcloud.activity.executors.ActivityRunner.runPhase(ActivityRunner.java:156) at com.vmware.vcloud.activity.executors.ActivityRunner.run(ActivityRunner.java:118) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)


The vClouderror reports that the datastore is not available when in fact there is a valid datastore available for the deployment which has no issues when creating a vm manually via vCloud Director. The stemcell is also uploaded to the datastore with no issues.

We had previously deployed Cloud Foundry v182 including Bosh with no trouble on vCloud but now struggle with the latest releases (bosh v257.9 & vCloud CPI v24). We have attempted to use both the latest vCloud and vSphere stemcells (v3262.12) with no luck.

For reference, below is the vcd section of the bosh.yml:


vcd: &vcd # <--- Replace values below url: https://*** user: ** password: **** entities: organization: Cloud-Foundry virtual_datacenter: Cloud-Foundry-VDC vapp_catalog: cf-catalog-bosh media_catalog: cf-catalog-bosh media_storage_profile: '*' vm_metadata_key: bosh-meta control: {wait_max: 900}


Any ideas what could be causing the issue?

Thanks

cunnie commented 7 years ago

@Prospecta afraid we haven't seen that error before. You might try running BOSH_INIT_LOG_LEVEL=debug BOSH_INIT_LOG_PATH=/tmp/bosh-debug.log bosh-init deploy your-manifest.yml and attach the debug logs to this issue. Please scrub any passwords from these logs first as they may contain sensitive info.

Prospecta commented 7 years ago

Thanks Cunnie,

Please check the attached logs bosh.log.zip

cunnie commented 7 years ago

Thanks @Prospecta,

Some things to try:

To speed up your debugging, you can deploy a VM with no jobs rather than an entire CF. Here's part of a manifest to accomplish this (we haven't tried this ourselves, but it should work):

---
name: empty

releases:
- name: bosh-vcloud-cpi
   url: https://bosh.io/d/github.com/cloudfoundry-incubator/bosh-vcloud-cpi-release?v=24
   sha1: 6b223f73f3818363b6af15a7326d3894ea0c56c6

resource_pools:
- name: vms
  network: private
  stemcell:
    url: https://bosh.io/d/stemcells/bosh-vcloud-esxi-ubuntu-trusty-go_agent?v=3262.12
    sha1: 333187bc7f7e35cd714c0aa8c4f699393cdcb0c2
  cloud_properties:
    cpu: 2
    ram: 4_096
    disk: 20_000

disk_pools:
- name: disks
  disk_size: 20_000

networks:
- name: private
  type: manual
  subnets:
  - range: 10.85.57.0/24
    gateway: 10.85.57.1
    dns: [8.8.8.8]
    cloud_properties: {name: VM Network} # <--- Replace with Network name

instance_groups:
- name: empty_vm
  instances: 1
  jobs: []
  resource_pool: vms
  persistent_disk_pool: disks
  networks:
  - {name: private, static_ips: [10.0.0.6]}

cloud_provider:
  template: {name: vcloud_cpi, release: bosh-vcloud-cpi}
  mbus: "https://mbus:mbus-password@10.0.0.6:6868"
  properties:
    vcd:
      url: VCLOUD-URL
      user: VCLOUD-USER
      password: VCLOUD-PASSWORD
      entities:
        organization: VDC-ORGANIZATION
        virtual_datacenter: VDC-NAME
        vapp_catalog: bosh-catalog
        media_catalog: bosh-catalog
        media_storage_profile: '*'
        vm_metadata_key: bosh-meta
      control: {wait_max: 900}
    agent: {mbus: "https://mbus:mbus-password@0.0.0.0:6868"}
    blobstore: {provider: local, path: /var/vcap/micro_bosh/data/cache}
    ntp: [0.pool.ntp.org, 1.pool.ntp.org]
...

Some notes:

— Lyle Franklin & Brian Cunnie

Prospecta commented 7 years ago

Hi,

Thanks for the detailed response. To answer your questions, we are able to manually create VMs and attach disks successfully via the vCloud director UI, however running a fresh deploy with a new vApp name and no existing VMs via bosh-init we get the same error (even when trying to use the yaml that you posted).

I've had a discussion with our infrastructure team and they have mentioned vCloud director was recently updated to v8.0.1. Has the CPI been tested against this yet?

Either way we will also be raising this with VMware support to see what they come back with.

Thanks

ljfranklin commented 7 years ago

@Prospecta

updated to v8.0.1. Has the CPI been tested against this yet?

Honestly the vCloud CPI has been mostly static for quite a while. Our testing environments are vCloud Air 5.5 and 5.6. We'd be happy to accept PRs to make the CPI compatible with newer environments, but I'm afraid the CPI team doesn't have the bandwidth to get new environments and fix this ourselves.

Prospecta commented 7 years ago

Hi,

So I've been engaging with VMware support this week and they suggested we switch off SDRS for the datastore cluster that Bosh is being deployed to. Apparently there is an SDRS placement issue which affects version 8.0.1 of vCloud.

I have attempted the deployment of Bosh again (with SDRS turned off) and it has run successfully.

I have just posed the question to VMware to understand the impact of turning of this feature when deploying additional VMs to the datastore cluster. I'll let you know what they come back with.

Cheers

tlawrence commented 7 years ago

The question of VCD versioning will need to be addressed soon. There are some more significant API changes coming up in VCD 8.2. We would happy to help create some PRs for this nearer the time. @ljfranklin are you able to share some info on you test routines? We could probably allocate some compute resource for testing. Is there a travis job or similar?

ljfranklin commented 7 years ago

cc'ing @zaksoup & @cppforlife as I'm not sure what our current vCloud support plan looks like.