CenturyLinkCloud / chef-provisioning-vsphere

A chef-provisioning provisioner for VMware vSphere
MIT License
66 stars 57 forks source link

Machine resource doesnt seem to work as expected in machine_batch #23

Open johnsmyth opened 9 years ago

johnsmyth commented 9 years ago

I tried converging multiple machine resources in a machine_batch but it did not work as expected. I don't know if this is an issue with this driver or with the machine_batch resource, but it created 3 VMs, and then converged the same vm 3 times as 3 different nodes. Likewise, when destroying in machine_batch, all 3 nodes were unregistered from the chef server, but only one of the VMs was actually destroyed.

I was running from Chef-zero using Chef-client 12.3.0 on Windows 10.

mwrock commented 9 years ago

Could you include the recipe or a gist? Thanks!

johnsmyth commented 9 years ago

This is the (de-identified) recipe. Its just a simple POC so excuse the mess and comments...

FYI - when I pull out the "machine_batch" it runs sequentially and seems to converge the nodes correctly.

# Cookbook Name:: bsk-vmware-provisioner-poc
# Recipe:: default
#
# Copyright (c) 2015 The Authors, All Rights Reserved.

chef_gem 'chef-provisioning-vsphere' do
    action :install
    compile_time true
end

require 'chef/provisioning/vsphere_driver'

with_vsphere_driver host: '10.10.10.10',
        insecure: true,
        user:     'mycompany\vm_admin_username',
        password: 'vmware_password'

with_machine_options({ 
  :bootstrap_options => {
    use_linked_clone: false,
    num_cpus: 2,
    memory_mb: 4096,
    network_name: ["VMNET.DMZ"],
    datacenter: 'HQ-DC',
    resource_pool: 'LAB',
    template_name: 'lab1temp2012R2',
    template_folder: 'LAB/Templates',
    vm_folder: 'LAB',
    datastore: 'RAID5-LUN20 (LAB)',
    customization_spec: {
      ipsettings: {
        dnsServerList: ['10.10.10.1'] #dns list is required even when using dhcp.  And the error message is a nil for nilclass error.  had to grep the code to figure it out
      },
      domainAdmin: 'admin_username@lab.mycompany.com',
      domainAdminPassword: 'admin_password',
      org_name: 'MyCompany, inc',
      product_id: 'AAAAA-AAAAA-AAAAA-AAAAA-AAAAA', #If this is wrong it fails creating the unattend file
      win_time_zone: 4, # this is required.  I dont know what 4 is, so we'll need to change this
      domain: 'lab.mycompany.com'
    },
    ssh: {
      #user: 'admin_username@lab.mycompany.com',
      user: 'mycompany\admin_username', ##this MUST be in netbios name format, apparently
      password: 'admin_password',
      paranoid: false,
    },
  },
  convergence_options: { 
    ssl_verify_mode: :verify_none #required bc of the self-signed cert
  }
})

chef_server_settings =  {
  chef_server_url: 'https://chefserver.lab.mycompany.com/organizations/mycompany',
  options: {
    client_name: 'jsmyth',
    signing_key_filename: 'C:/Users/jsmyt_000/src/mycompany/.chef/jsmyth.pem',
  }
}

ivr_attributes  = { 
    'ivr_artifact_url' => 'http://build.hq.mycompany.com/view/All/job/Brainshark.Services.IVR/15/artifact/BuildArtifact/Brainshark.Services.IVR.msdeploy.zip'
  }

num_ivr_servers = 3
machine_batch do
  retries 5
  retry_delay 30
  1.upto(num_ivr_servers) do |i|

    machine "QA1JSMYTH#{i.to_s.rjust(2,"0")}" do
      action :converge
         run_list [ 'role[ivr-application]' ]
      chef_server chef_server_settings
      chef_environment 'qa'
      attributes ivr_attributes
      retries 5
      retry_delay 30
    end
  end
end
mwrock commented 9 years ago

I have not had any luck reproducing this and have a recipe almost identical to yours. Using chef zero with windows nodes pointing to a real chef server and inside of a machine_batch. I too am using chef v 12.3.0.

Other key gems: cheffish 1.3.0 chef-provisioning 1.2.0 (then tried with 1.3.0)

Getting a full log of the run may help to troubleshoot further.

johnsmyth commented 9 years ago

This is my first experience with Chef provisioning, so I apologize if I'm totally off base here, but it looks like the machine_spec being passed to allocate_machine is the same for all 3 machines. The server_id is the same, and is the one set on QAJSMYTH02 in chef. Relevant code snippet:

 * machine_batch[mybatch] action converge[2015-08-21T15:36:21-05:00] WARN: Checking to see if {"driver_url"=>"vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true", "driver_version"=>"0.7.1", "server_id"=>"50120ead-c79b-6ea3-e57f-69ba300a8d2e", "is_windows"=>true, "allocated_at"=>"2015-08-21 18:21:00 UTC", "ipaddress"=>"10.10.10.217"} has been created...
[2e0s1t5a-bl08i-s2hi1nTg 1c5onn:ec3t6i:o2n1 -to0 51:00.30]0. W4A5R.1N:7 0C
cking to see if {"driver_url"=>"vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true", "driver_version"=>"0.7.1", "server_id"=>"50120ead-c79b-6ea3-e57f-69ba300a8d2e", "is_windows"=>true, "allocated_at"=>"2015-08-21 18:21:00 UTC", "ipaddress"=>"10.10.10.217"} has been created...
[e20s1t5a-0b8-l21T15i:s3h6:i2n1g- 05co:0n0]n ecWtAiRoN:n C hteco ki1n0g .t30o .see if4 {5"d.river1_7u0rl
=>"vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true", "driver_version"=>"0.7.1", "server_id"=>"50120ead-c79b-6ea3-e57f-69ba300a8d2e", "is_windows"=>true, "allocated_at"=>"2015-08-21 18:21:00 UTC", "ipaddress"=>"10.10.10.217"} has been created...
establishing connection to 10.30.45.170
[2015-08-21T15:36:21-05:00] WARN: returning existing machine
[2015-08-21T15:36:21-05:00] WARN: returning existing machine
[2015-08-21T15:36:21-05:00] WARN: returning existing machine

    - [QA1JSMYTH01] Power on VM [IVR/QA1JSMYTH02]
    - [QA1JSMYTH02] Power on VM [IVR/QA1JSMYTH02]
    - [QA1JSMYTH03] Power on VM [IVR/QA1JSMYTH02]
mwrock commented 9 years ago

That does indeed look like its the case and very wrong. One thing I'd do if you have not already is not only delete the VMs but also the nodes from your chef server before trying again. That may or may not make a difference. Also what version of chef-provisioning are you using?

johnsmyth commented 9 years ago

I was running chef-provisioning 1.1.1 and cheffish 1.2. I tried chef-provisioning 1.3.0 and cheffish 1.3.0 and had the same results, though I did not fully delete the objects before I ran that test. I'll clean that up and retry from scratch with the new versions.

johnsmyth commented 9 years ago

I cleaned up all the objects and re-ran using cheffish 1.3.0 and chef-provisioning 1.3.0 and had the same results. The VMs get created, and each has a unique server_id, and each Chef node gets created with that Id:


[QA1JSMYTH03] Machine - created - QA1JSMYTH03 (5012b72c-b74c-a3f8-f66b-8e123075ca6d on vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true)[2015-08-24T08:56:52-05:00] WARN: Class Chef::Provider::ChefNode does not declare 'resource_name :chef_node'.
[2015-08-24T08:56:52-05:00] WARN: This will no longer work in Chef 13: you must use 'resource_name' to provide DSL.

    - [QA1JSMYTH01] Machine - created - QA1JSMYTH01 (50129758-47d9-2de3-66ed-59374056e976 on vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true)[2015-08-24T08:56:52-05:00] WARN: Class Chef::Provider::ChefNode does not declare 'resource_name :chef_node'.
[2015-08-24T08:56:52-05:00] WARN: This will no longer work in Chef 13: you must use 'resource_name' to provide DSL.

    - [QA1JSMYTH02] Machine - created - QA1JSMYTH02 (50123c10-c3e8-1ed1-4f1e-899a55ba1b7b on vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true)[2015-08-24T08:56:52-05:00] WARN: Class Chef::Provider::ChefNode does not declare 'resource_name :chef_node'.
[2015-08-24T08:56:52-05:00] WARN: This will no longer work in Chef 13: you must use 'resource_name' to provide DSL.

but then when Chef provisioning references the servers, it uses the same server_id for all 3 VMS:

    - [QA1JSMYTH03]   add normal.chef_provisioning = {"reference"=>{"driver_url"=>"vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true", "driver_version"=>"0.8.0", "server_id"=>"50123c10-c3e8-1ed1-4f1e-899a55ba1b7b", "is_windows"=>true, "allocated_at"=>"2015-08-24 13:56:52 UTC", "ipaddress"=>nil}}
    - [QA1JSMYTH01]   add normal.chef_provisioning = {"reference"=>{"driver_url"=>"vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true", "driver_version"=>"0.8.0", "server_id"=>"50123c10-c3e8-1ed1-4f1e-899a55ba1b7b", "is_windows"=>true, "allocated_at"=>"2015-08-24 13:56:52 UTC", "ipaddress"=>nil}}
    - [QA1JSMYTH02]   add normal.chef_provisioning = {"reference"=>{"driver_url"=>"vsphere://10.30.45.170/sdk?use_ssl=true&insecure=true", "driver_version"=>"0.8.0", "server_id"=>"50123c10-c3e8-1ed1-4f1e-899a55ba1b7b", "is_windows"=>true, "allocated_at"=>"2015-08-24 13:56:52 UTC", "ipaddress"=>nil}}
johnsmyth commented 9 years ago

Do you think this might be a chef provisioning issue, and not a driver issue?

mwrock commented 9 years ago

Possibly. Its hard to say. I've tried a multitude of ways to reproduce this without any luck but I'm reluctant to "pass the buck". One thing that would help would be to get a complete bundle list of your gems to ensure we are testing the same bits. If you are not using bundle exec already, could you run bundle install and include the bundle.lock file in this issue or a gist. I'll also do a bit of diving to see what might cause this scenario.

rperez31 commented 7 years ago

Thank you guys so much! I was missing a lot in my recipe, since i just copied the basics. I will try this and give it a go. 👍 })

chef_server_settings = { chef_server_url: 'https://chefserver.lab.mycompany.com/organizations/mycompany', options: { client_name: 'jsmyth', signing_key_filename: 'C:/Users/jsmyt_000/src/mycompany/.chef/jsmyth.pem', } }