CenturyLinkCloud / chef-provisioning-vsphere

A chef-provisioning provisioner for VMware vSphere
MIT License
66 stars 57 forks source link

Cheffish goes crazy on machine_options #51

Open nwesoccer opened 8 years ago

nwesoccer commented 8 years ago

Situation: We have made our provisioning very dynamic in order to define our multiple environments in the json environment files. We have an array of servers that the loop in the code below goes through to call the machine resource. The code snippet below is just the pertinent part of our provisioning recipe, there is not much else other than the definitions of the variables and such. The issue we have is that it will machine :converge 6 linux boxes, then the first windows box the log spits out several thousand lines of:

creating machine wnsql1 on vsphere://???.net/sdk?use_ssl=true&insecure=true
    -   use_linked_clone: false
    -   customization_spec: #<Cheffish::MergedConfig:0x000000085c0f28 @configs=[{:org_name=>
*** THOUSANDS OF LINES OF REPEAT CODE, THE MATRIX ***

As the code shows, we are just passing in a hash, where it is being converted into a Cheffish::MergedConfig I don't know, but it seems to be the merger of the same config hundreds of times. I'm not sure if it's just the fact that it's number 7, or if it's windows, or what....Are we just doing something wrong and not understanding ruby/chef-provisioning?

Provisioning Server Chef Versions

chef (12.5.1)
chef-config (12.5.1)
chef-dk (0.10.0)
chef-provisioning (1.5.1, 1.5.0)
chef-provisioning-aws (1.6.1)
chef-provisioning-azure (0.4.0)
chef-provisioning-fog (0.15.0)
chef-provisioning-vagrant (0.10.0)
chef-provisioning-vsphere (0.8.3)
chef-vault (2.6.1)
chef-zero (4.3.2, 1.5.6)
cheffish (1.6.0)

Provisioning Recipe

    cluster[:servers].each do |server|
        bootstrap_options = {
            use_linked_clone: false,
            customization_spec: {
                :org_name => '???',
                :time_zone => 'America/New_York',
                :win_time_zone => 35,
                :product_id => windows_product[:id],
                :ipsettings => {}
            },
            :ssh => {
                :paranoid => false
            }
        }

        bootstrap_options[:vm_folder] = cluster[:vm_folder] if cluster[:vm_folder]
        bootstrap_options[:datacenter] = cluster[:datacenter] if cluster[:datacenter]
        bootstrap_options[:datastore] = cluster[:datastore] if cluster[:datastore]
        bootstrap_options[:host] = cluster[:host] if cluster[:host]
        bootstrap_options[:resource_pool] = cluster[:resource_pool] if cluster[:resource_pool]
        bootstrap_options[:template_folder] = cluster[:template_folder] if cluster[:template_folder]

        bootstrap_options[:template_name] = server[:template] if server[:template]
        bootstrap_options[:network_name] = server[:networks] if server[:networks]
        bootstrap_options[:num_cpus] = server[:cpu] if server[:cpu]
        bootstrap_options[:memory_mb] = server[:ram] if server[:ram]

        bootstrap_options[:customization_spec][:domain] = cluster[:domain] || 'local'
        bootstrap_options[:customization_spec][:domainAdmin] = domain_secrets[:join_username] if domain_secrets[:join_username]
        bootstrap_options[:customization_spec][:domainAdminPassword] = domain_secrets[:join_password] if domain_secrets[:join_password]
        bootstrap_options[:customization_spec][:hostname] = server[:hostname] if server[:hostname]
        bootstrap_options[:customization_spec][:ipsettings][:ip] = server[:ip] if server[:ip]
        bootstrap_options[:customization_spec][:ipsettings][:subnetMask] = cluster[:subnet_mask] if cluster[:subnet_mask]
        bootstrap_options[:customization_spec][:ipsettings][:gateway] = server[:gateways] if server[:gateways]
        bootstrap_options[:customization_spec][:ipsettings][:dnsServerList] = cluster[:dns_servers] if cluster[:dns_servers]

        bootstrap_options[:ssh][:user] = server[:transport_mode] == 'ssh' ? ssh_secrets[:username] : winrm_secrets[:username]
        bootstrap_options[:ssh][:password] = server[:transport_mode] == 'ssh' ? ssh_secrets[:password] : winrm_secrets[:password]

        data_bag_secret_location = ::File.join('etc', 'chef', 'encrypted_data_bag_secret') if server[:transport_mode] == 'ssh'
        data_bag_secret_location = ::File.join('C:', 'chef', 'encrypted_data_bag_secret') if server[:transport_mode] == 'winrm'

        machine server[:hostname] do
            run_list server[:run_list]
            chef_environment node.chef_environment
            files data_bag_secret_location => '/etc/chef/secret'
            machine_options :bootstrap_options => bootstrap_options,
                :convergence_options => {
                    ssl_verify_mode: :verify_none
                },
                :start_timeout => 2000,
                :create_timeout => 2000,
                :ready_timeout => 2000
        end
    end
mwrock commented 8 years ago

Hard to say from this cursory glimpse but can you try to edit your cluster to one windows node to see if that replicates the problem?

nwesoccer commented 8 years ago

Yeah I'll give that a try. In the meantime, I did find this line of code. Is it possible that each machine resource is using the same instance of the driver and therefore the @config instance variable just keeps merging and merging and merging every new machine option? I've also found that it only happens when the server in question doesn't already exist, and since all the linux ones do exists, it being a windows machine might not have anything to do with it. I will test that as well and see if it just grows as I stand up more and more during an initial run.

mwrock commented 8 years ago

you are correct that the configs are repeatedly merged into the same MergedConfig however that should not be inflating the MergeConfi. Rather, new configs are overlayed on top of the older ones. I could see where that could cause issues with configs of different shapes but it looks like all of your machines share the same property keys.

nwesoccer commented 8 years ago

Ah ok. I dug into MergedConfig code a little. Seems they actually maintain a list of all the configs and a calculated merge of them all. So I think just size and number of configs I have merged...the worst the calculations are. Also if I understand the code correctly, it only does the calculation and 'cache' the calculation per key when the key is first requested. In the driver 'full_description' method, there is a call to to_hash, which would cause the calculation to happen for every key....my guess...causing a ton calculations as it loops through every config for every key.

At this point, I think my only solution would be to call with_vsphere_driver inside my loop, so that it just replaces the driver each time, or instanciate my own ChefProvisioningVsphere::VsphereDriver.canonicalize_url within the loop to pass to machine resource.

mwrock commented 8 years ago

That sounds reasonable. Ideally the merging here should work differently to avoid this altogether.

nwesoccer commented 8 years ago

Seems my solution didn't work either. The with_vsphere_driver seems to happen at compile time and therefore they end up all having the last one at runtime and I have the same issue. I believe my issue is at

    def merge_options!(machine_options)
      @config = Cheffish::MergedConfig.new(
        { machine_options: machine_options },
        @config
      )
    end

Is anyone else having this issue? Seems unlikely that we are the only ones trying to stand up 10-15 servers in the same provisioning script and changing machine_options for each.

nwesoccer commented 8 years ago

We replaced the above with

   def merge_options!(machine_options)
      @config = { machine_options: machine_options }
      )
    end

The below output (from converge of the 7th server) shows that somewhere else the machine_options are getting Merged. Change the above code, slowed the madness down to 19 merges on 7 server from the previous hundreds of merges that freezes up chef-client.

It seems not only the merge upon the merge per machine, but multiple times within a machine is causing exponential growth in merge count. Do we really need these merged options here since they are already being merged elsewhere? For now we will use our modified gem, if we find that the above merge is not needed, can we do a pull request to get that merged in?

customization_spec: #<Cheffish::MergedConfig:0x00000009f0b2f8 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.21", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.1"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"wnapi1"}, #<Cheffish::MergedConfig:0x00000009f0b370 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.55", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"wntasks1"}, #<Cheffish::MergedConfig:0x00000009f0b3e8 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.55", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"wntasks1"}, #<Cheffish::MergedConfig:0x00000009f0b460 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.42", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"wnhirus1"}, #<Cheffish::MergedConfig:0x00000009f0b4d8 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.42", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"wnhirus1"}, #<Cheffish::MergedConfig:0x00000009f0b550 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.74", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.65"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"wnsql1"}, #<Cheffish::MergedConfig:0x00000009f0b5c8 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.74", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.65"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"wnsql1"}, #<Cheffish::MergedConfig:0x00000009f0b640 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.49", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnes3"}, #<Cheffish::MergedConfig:0x00000009f0b6b8 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.49", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnes3"}, #<Cheffish::MergedConfig:0x00000009f0b730 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.48", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnes2"}, #<Cheffish::MergedConfig:0x00000009f0b7a8 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.48", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnes2"}, #<Cheffish::MergedConfig:0x00000009f0b820 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.47", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnes1"}, #<Cheffish::MergedConfig:0x00000009f0b898 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.47", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.33"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnes1"}, #<Cheffish::MergedConfig:0x00000009f0b910 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.27", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.1"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnredis2"}, #<Cheffish::MergedConfig:0x00000009f0b988 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.27", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.1"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnredis2"}, #<Cheffish::MergedConfig:0x00000009f0ba00 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.26", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.1"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnredis1"}, #<Cheffish::MergedConfig:0x00000009f0ba78 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.26", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.1"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnredis1"}, #<Cheffish::MergedConfig:0x00000009f0baf0 @configs=[{:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.2", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.1"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnlb1"}, {:org_name=>"???", :time_zone=>"America/New_York", :win_time_zone=>35, :product_id=>"???", :ipsettings=>{:ip=>"192.168.10.2", :subnetMask=>"255.255.255.224", :gateway=>["192.168.10.1"], :dnsServerList=>["192.168.1.2", "192.168.1.3"]}, :domain=>"???", :domainAdmin=>"???", :domainAdminPassword=>"???", :hostname=>"lnlb1"}], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>], @merge_arrays={}>

brandocorp commented 7 years ago

I think this is a combination of the machine provider behavior with the driver behavior. The driver is recursively merging the configuration in multiple places, and each time the machine provider performs an action, it gets another recursive merge. Since it looks like the merging of configs, which is taking place in the driver, is already handled at the machine level. Perhaps this can be removed?