chef-boneyard / chef-provisioning

A library for creating machines and infrastructures idempotently in Chef.
Apache License 2.0
523 stars 163 forks source link

Attributes are erased at re-converge when set in 'machine' resource #137

Closed hulu1522 closed 9 years ago

hulu1522 commented 10 years ago

We are using chef-metal to converge machines but each machine needs some different attributes to be set or overridden. The problem is that any attribute that is set during the converge inside recipes will be deleted on the next converge or at the end of a batch converge (issue #116).

This is our code for the machine resource:

machine "db.#{$domain}" do
  recipe 'base'
  recipe 'base::cloud_setup'
  recipe 'disk'
  recipe 'db'
  tag 'db'
  attributes(
    postgresql: {
      dir: '/opt/postresql/data'
    },
    volumes: [
      {
        name: "db_data",

        physical_disks: [
          {name: "disk01-db.#{$domain}", size: 20},
          {name: "disk02-db.#{$domain}", size: 20}
        ],
        mount_point: '/opt/postresql/data',
        lvm_attributes: {
          size: '100%VG',
          filesystem: 'ext4'
        }
      }
    ]
  )
end

This is what happens at the beginning of the second converge or the end of a machine_batch run:

Recipe: internal_services::is_setup
  * machine[db.is.sgu] action converge
    - update node db.is.sgu at https://h.tcncloud.net/
    -   add normal.postgresql = {:dir=>"/opt/postresql/data"}
    -   add normal.volumes = [{:name=>"db_data", :physical_disks=>[{:name=>"disk01-db.is.sgu", :size=>20}, {:name=>"disk02-db.is.sgu", :size=>20}], :mount_point=>"/opt/postresql/data", :lvm_attributes=>{:size=>"100%VG", :filesystem=>"ext4"}}]
    -   remove normal.postgresql
    -   remove normal.volumes
    -   remove normal.authorization
    -   remove normal._fog_client
    -   remove normal.fog_cloud
    -   remove normal.build-essential
    - update node db.is.sgu at https://h.tcncloud.net/
    -   add normal.postgresql = {:dir=>"/opt/postresql/data"}
    -   add normal.volumes = [{:name=>"db_data", :physical_disks=>[{:name=>"disk01-db.is.sgu", :size=>20}, {:name=>"disk02-db.is.sgu", :size=>20}], :mount_point=>"/opt/postresql/data", :lvm_attributes=>{:size=>"100%VG", :filesystem=>"ext4"}}]
    -   remove normal.postgresql
    -   remove normal.volumes
    - update node db.is.sgu at https://h.tcncloud.net/
    -   add normal.postgresql = {:dir=>"/opt/postresql/data"}
    -   add normal.volumes = [{:name=>"db_data", :physical_disks=>[{:name=>"disk01-db.is.sgu", :size=>20}, {:name=>"disk02-db.is.sgu", :size=>20}], :mount_point=>"/opt/postresql/data", :lvm_attributes=>{:size=>"100%VG", :filesystem=>"ext4"}}]
    -   remove normal.postgresql
    -   remove normal.volumes
    - update node db.is.sgu at https://h.tcncloud.net/
    -   add normal.postgresql = {:dir=>"/opt/postresql/data"}
    -   add normal.volumes = [{:name=>"db_data", :physical_disks=>[{:name=>"disk01-db.is.sgu", :size=>20}, {:name=>"disk02-db.is.sgu", :size=>20}], :mount_point=>"/opt/postresql/data", :lvm_attributes=>{:size=>"100%VG", :filesystem=>"ext4"}}]
    -   remove normal.postgresql
    -   remove normal.volumes

As you can see the first section that updated the attributes deleted all of the node attributes that were set during the run. This makes it impossible for later recipes that depend on these attributes to function when re-converging.

mwrock commented 10 years ago

We have just noticed this same thing. It had not been an issue before but as we are reconverging and have used knife to edit attributes, we were surprised to see attributes wiped out. I'm gonna look closer tomorrow but it seems like the issue is somewhere in load_current_resource of the machine provider:

  def load_current_resource
    node_driver = Chef::Provider::ChefNode.new(new_resource, run_context)
    node_driver.load_current_resource
    json = node_driver.new_json
    json['normal']['metal'] = node_driver.current_json['normal']['metal']
    @machine_spec = ChefMetal::ChefMachineSpec.new(json, new_resource.chef_server)
  end

Haven't looked at the core chef code yet but likely here: json = node_driver.new_json if new_json just returns an empty json block. Seems like it should hit up the chef server to grab the existing node state.

jkeiser commented 10 years ago

chef_node (the provider) is supposed to hit up the chef server for the state when it gets new_json (new_json's job is to grab the current state and merge it with the desired state). This is passing strange!

mwrock commented 10 years ago

ahh ok. Well I'll do more debugging today and get you more data. thanks.

mwrock commented 10 years ago

I just submitted a PR to Cheffish https://github.com/opscode/cheffish/pull/20 that appears to fix this. It adds a deep merge to the current/new json merge. That PR comment includes another example of where an attribute is getting dropped.

johnewart commented 10 years ago

I think that the issue here is that attributes is effectively a "complete" operation - it says "here are the attributes that I want to be set on this node and all others be damned!". I think that there is an attribute attribute (a little too meta there maybe) that you can set multiple times which will append / set the specific attribute you are setting to a given value.

I think that the attributes attribute should behave the way you are expecting it to but that's not the current behavior.

hulu1522 commented 10 years ago

Okay, so the attribute attribute should fix a single machine converge. Is there any idea why using machine_batch would delete all the attributes on machines after the converge?

jkeiser commented 10 years ago

As far as the machine_batch, I think I need to see the code--it certainly shouldn't delete the attributes you specified on the machine resource inside the machine_batch.

jkeiser commented 9 years ago

This problem was fixed in machine_batch today, here: https://github.com/chef/chef-provisioning/pull/294

Please do reopen if things are not fixed, but I'm pretty sure this is gone as of the latest version released today!