cloudfoundry / bosh

Cloud Foundry BOSH is an open source tool chain for release engineering, deployment and lifecycle management of large scale distributed services.
https://bosh.io
Apache License 2.0
2.03k stars 658 forks source link

create_disk fails on cloud_config directed deployment - Datastore matching? #1921

Closed jhohiii closed 6 years ago

jhohiii commented 6 years ago

I have a standalone (non-CF/PCF) BOSH deployment to handle various projects (concourse/prometheus). In deploying prometheus, I've found that the deployment only wants to use the vcenter_ds value instead of the cloud-config AND it won't honor the regex matching. This is during creation of the persistent disks.

Error: CPI error 'Bosh::Clouds::CloudError' with message 'No valid placement found for disks:
- Size: 1024, Target DS Pattern:VxRail-Virtual-SAN-Datastore-.*, Current Location: N/A

Possible placement options:
Datastores:

' in 'create_disk' CPI method

Attached are my cloud-config and bootstrap. As you see, the cloud-config specifically references EXACT valid datastore names for the persistent disks, but the deploy is only using the one specified in the create-env (which should also work!).

If it looks like I'm over describing the disks, it is because I have tried literally 20-40 times to get a version that works. The only way I have to get it to work is to deploy to the same vSphere cluster as the bosh director (AZ2 in the cloud config).

cloud-config.yml.txt prometheus-deployed.yml.txt

schmidtsv commented 6 years ago

What confuses me, according to the Docs and my experience you most likely have an error in your cloud-config or director config.

In the AZ config, You set a datastore pattern and a persistent_datastore pattern, although both are not in the documentation. datastore_pattern and persistent_datastore_pattern both go into the CPI config. You add the Datastores to Disks but not to VMs (which can lead to problems.

Additionally you added the general CPI config into the cloud config, where it has no effect, as it belongs to the director that deploys.

Your networks seems wierd, the distributed port group in z1 is AZ1-vlan-17-IAAS-PCF-Infra and in z2 is vlan-17-IAAS-PCF-Infra. You sure that is right? Because in all setups I have seen the Port groups usually enumerate (If you get some wierd port errors, it would be a Port group misnaming)

My first guess would be, that it either is unable to place a VM into a datastore because your VMs have no ephemeral disk configuration (unless your CPI config is also in the director, in which case defining a datastore_pattern is a string there and it expects a regex matching a storage that is reachable from the cluster (check if the datastore is attached to the cluster). Alternatively the datastores are not reachable from the cluster. Can you check if you can manually start a VM in the desired cluster using one of the datastores you linked?

jhohiii commented 6 years ago

Apparently, I don't have a cpi-config: [ jhoh@cvglppcfutl01 ~ ] $ bosh cpi-config Using environment '10.51.17.15' as client 'admin'

No CPI config

Exit code 1

Where do I configure that?

All of these VMs get successfully created in either AZ (Cluster) on their corresponding local only storage (one VxRail-.* datastore per cluster). The issue comes when persistent disk is being created (before it is attached). The networking is correct. We have a single VLAN across both clusters, but with different portgroup names. They are just labels. I will change the name in the second AZ - prepending 'AZ2-' to match the prepend of 'AZ1-'.

I'm really interested in how I update my CPI config. @schmidtsv

schmidtsv commented 6 years ago

You do not need the CPI config if you are using it. I was referring to the one in the Director manifest. (The one on the bosh vm, not in the cloud_provider section). Can you just either link me your create-env call with the variables in it (-the password) or take your create-env call, replace create-env with interpolate with the added option --path /instance_groups/name=bosh/properties/vcenter/datacenters/0/clusters and paste me the clusters that come out?

jhohiii commented 6 years ago

@schmidtsv I could have sworn that I added the bootstrap script...oh well, here it is...

bosh create-env bosh-deployment/bosh.yml \
    --state=state.json \
    --vars-store bosh-utility-1/creds.yml \
    -o bosh-deployment/vsphere/cpi.yml \
    -o bosh-deployment/vsphere/resource-pool.yml \
    -o bosh-deployment/misc/dns.yml \
    -v director_name=utility-bosh-1 \
    -v internal_ip=class-b.17.15 \
    -v internal_gw=class-b.16.1 \
    -v internal_dns=[class-b2.144.177,class-b3.144.17] \
    -v internal_cidr=class-b.16.0/22 \
    -v network_name="vlan-17-IAAS-PCF-Infra" \
    -v vcenter_dc=cvg_iaas \
    -v vcenter_ds="VxRail-Virtual-SAN-Datastore-.*" \
    -v vcenter_ip=class-b.4.22 \
    -v vcenter_user= <redacted>\
    -v vcenter_password=<redacted> \
    -v vcenter_templates=utility-1-templates \
    -v vcenter_vms=utility-1-vms \
    -v vcenter_disks=utility-1-disks \
    -v vcenter_rp=bosh-utility-1 \
    -v vcenter_cluster=cvg_iaas_internal
jhohiii commented 6 years ago

@schmidtsv

- cvg_iaas_internal:
    resource_pool: bosh-utility-1

Succeeded

At the --path /instance_groups/name=bosh/properties/vcenter level:

datacenters:
- clusters:
  - cvg_iaas_internal:
      resource_pool: bosh-utility-1
  datastore_pattern: VxRail-Virtual-SAN-Datastore-.*
  disk_path: utility-1-disks
  name: cvg_iaas
  persistent_datastore_pattern: VxRail-Virtual-SAN-Datastore-.*
  template_folder: utility-1-templates
  vm_folder: utility-1-vms
schmidtsv commented 6 years ago

Ok, your problem is simple. In the director manifest you only mention one cluster. For technical reasons BOSH wants you to add all clusters it can deploy to there. you try to deploy to a different cluster. One way to fix that is creating an ops file with that snippet and add it as ops file to the end of your command

- type: replace
  path: /instance_groups/name=bosh/properties/vcenter/datacenters/0/clusters
  value:
  - cvg_iaas_internal:
    resource_pool: bosh-utility-1
  - cvg_iaas_internal_1:
    resource_pool: bosh-utility-1

Then deploy the director, and you should be able to deploy on both Cluster without all the data store clutter in the cloud-config

jhohiii commented 6 years ago

That works and makes sense when you describe it that way. I'd much rather deploy bosh with a complete manifest instead of all those "-v" command line arguments.

Anyway, now it is failing:

Deployment state: 'state.json'

Started validating
  Downloading release 'bosh'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh'... Finished (00:00:00)
  Downloading release 'bosh-vsphere-cpi'... Skipped [Found in local cache] (00:00:00)
  Validating release 'bosh-vsphere-cpi'... Finished (00:00:00)
  Validating cpi release... Finished (00:00:00)
  Validating deployment manifest... Finished (00:00:00)
  Downloading stemcell... Skipped [Found in local cache] (00:00:00)
  Validating stemcell... Finished (00:00:02)
Finished validating (00:00:04)

Started installing CPI
  Compiling package 'ruby-2.4-r3/8471dec5da9ecc321686b8990a5ad2cc84529254'... Finished (00:00:00)
  Compiling package 'vsphere_cpi/3049e51ead9d72268c1f6dfb5b471cbc7e2d6816'... Finished (00:00:00)
  Compiling package 'iso9660wrap/82cd03afdce1985db8c9d7dba5e5200bcc6b5aa8'... Finished (00:00:00)
  Installing packages... Finished (00:00:00)
  Rendering job templates... Finished (00:00:00)
  Installing job 'vsphere_cpi'... Finished (00:00:00)
Finished installing CPI (00:00:01)

Starting registry... Finished (00:00:00)
Uploading stemcell 'bosh-vsphere-esxi-ubuntu-trusty-go_agent/3468.21'... Skipped [Stemcell already uploaded] (00:00:00)

Started deploying
  Waiting for the agent on VM 'vm-8baa895f-1c30-4b93-a368-da1dca45ca97'... Finished (00:00:00)
  Stopping jobs on instance 'unknown/0'... Finished (00:00:00)
  Unmounting disk 'disk-b15c8465-ea45-4562-8822-ec5a14d86317'... Finished (00:00:01)
  Deleting VM 'vm-8baa895f-1c30-4b93-a368-da1dca45ca97'... Finished (00:00:10)
  Creating VM for instance 'bosh/0' from stemcell 'sc-98d32778-2acb-4899-af74-dc27c94c3047'... Finished (00:00:12)
  Waiting for the agent on VM 'vm-45f8a435-c10a-4544-a31e-665770120752' to be ready... Finished (00:00:27)
  Attaching disk 'disk-b15c8465-ea45-4562-8822-ec5a14d86317' to VM 'vm-45f8a435-c10a-4544-a31e-665770120752'... Finished (00:00:15)
  Rendering job templates... Failed (00:00:03)
Failed deploying (00:01:22)

Stopping registry... Finished (00:00:00)
Cleaning up rendered CPI jobs... Finished (00:00:00)

Deploying:
  Building state for instance 'bosh/0':
    Rendering job templates for instance 'bosh/0':
      Rendering templates for job 'vsphere_cpi/7ef406ee1d230b5322d8dc8ccce5da6bbbdabe7c':
        Rendering template src: cpi.json.erb, dst: config/cpi.json:
          Rendering template src: /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/bosh-release-job368514904/templates/cpi.json.erb, dst: /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/rendered-jobs340009586/config/cpi.json:
            Running ruby to render templates:
              Running command: 'ruby /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/erb-renderer041303363/erb-render.rb /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/erb-renderer041303363/erb-context.json /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/bosh-release-job368514904/templates/cpi.json.erb /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/rendered-jobs340009586/config/cpi.json', stdout: '', stderr: '/home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/erb-renderer041303363/erb-render.rb:189:in `rescue in render': Error filling in template '/home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/bosh-release-job368514904/templates/cpi.json.erb' for vsphere_cpi/0 (line 44: #<NoMethodError: undefined method `inject' for nil:NilClass>) (RuntimeError)
        from /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/erb-renderer041303363/erb-render.rb:175:in `render'
        from /home/jhoh/.bosh/installations/bf01fd76-af1d-46a1-587a-6b0d93900a5d/tmp/erb-renderer041303363/erb-render.rb:200:in `<main>'
':
                exit status 1

Exit code 1
bosh create-env bosh-deployment/bosh.yml \
    --state=state.json \
    --vars-store bosh-utility-1/creds.yml \
    -o bosh-deployment/vsphere/cpi.yml \
    -o bosh-deployment/vsphere/resource-pool.yml \
    -o bosh-deployment/misc/dns.yml \
    -o cluster-2.yml \
<same -v options as above>

cluster-2.yml

  path: /instance_groups/name=bosh/properties/vcenter/datacenters/0/clusters
  value:
  - cvg_iaas_internal:
    resource_pool: bosh-utility-1
  - cvg_iaas_internal_1:
    resource_pool: bosh-utility-1
schmidtsv commented 6 years ago

I think I forgot an indentation layer in front of resource_pool, aka 2 blanks, and he complains that you defined nil where he expected an array. You could also write it like this

For vars, we use a so called iaas_config file for that. Just make a single level yml that looks like this and use it with -l:

director_name: overbosh Or generally <-v variable>: value

jhohiii commented 6 years ago

@schmidtsv

That completely solved my issue.

Thank you for the education. Is there a more complete resource for documentation? It seems like some of the more common enterprise deployment options are not documented, but are communicated through Q&A.

cppforlife commented 6 years ago

btw later versions of vsphere cpi should eliminate this problem.