bbaugher / apache_zookeeper

Chef cookbook for Apache Zookeeper
MIT License
9 stars 22 forks source link

No way to disable server verification, and bootstrapping. #49

Closed wolftrouble closed 7 years ago

wolftrouble commented 7 years ago

I'm having problems getting terraform, chef and this cookbook to play nice. Specifically it's the fact that this cookbook (for reasons I can't quite understand) attempts to run validity checks on the serverlist, with no ability to disable this behavior, and which seems to happen at compile time - and thus before ohai can populate any of the automatic attributes, assuming the node is being bootstrapped and chef is running for the first time.

The error I'm seeing is pretty straightforward:

(chef): ================================================================================
(chef): Recipe Compile Error in /var/chef/cache/cookbooks/fooco-zookeeper/recipes/default.rb
(chef): ================================================================================

(chef): RuntimeError
(chef): ------------
(chef): Unable to find server [ip-10-15-40-135.ec2.internal in zoo.cfg attributes {"server.1"=>"int-kafka-bthomas01.foodomain.com:2888:3888", "server.2"=>"int-kafka-bthomas02.foodomain.com:2888:3888", "server.3"=>"int-kafka-bthomas03.foodomain.com:2888:3888"}

(chef): Cookbook Trace:
(chef): ---------------
(chef):   /var/chef/cache/cookbooks/apache_zookeeper/libraries/zookeeper_helper.rb:24:in `setup_helper'
(chef):   /var/chef/cache/cookbooks/apache_zookeeper/recipes/configure.rb:61:in `from_file'
(chef):   /var/chef/cache/cookbooks/apache_zookeeper/recipes/default.rb:7:in `from_file'
(chef):   /var/chef/cache/cookbooks/fooco-zookeeper/recipes/default.rb:9:in `from_file'

(chef): Relevant File Content:
(chef): ----------------------
(chef): /var/chef/cache/cookbooks/apache_zookeeper/libraries/zookeeper_helper.rb:
(chef):
(chef):  17:        node['apache_zookeeper']["zoo.cfg"].select{ |key, value| key.to_s.match(/\Aserver.\d+\z/)}.each do |key, value|
(chef):  18:          if does_server_match_node? value
(chef):  19:            @zookeeper_myid = key["server.".size, key.size]
(chef):  20:            break
(chef):  21:          end
(chef):  22:        end
(chef):  23:
(chef):  24>>       raise "Unable to find server [#{node["fqdn"]} in zoo.cfg attributes #{node['apache_zookeeper']["zoo.cfg"].select{ |key, value| key.to_s.match(/\Aserver.\d+\z/)}}" i>
(chef):  25:
(chef):  26:      elsif node['apache_zookeeper']["servers"].empty?
(chef):  27:        log "Configuring standalone zookeeper cluster"
(chef):  28:      else
(chef):  29:        log "Configuring mult-server zookeeper cluster"
(chef):  30:
(chef):  31:        id = 1
(chef):  32:        node['apache_zookeeper']["servers"].each do |server|
(chef):  33:          if server.include? ":"

My serverlist I'm setting is:

"servers": ["int-kafka-bthomas01.foodomain.com", "int-kafka-bthomas02.foodomain.com", "int-kafka-bthomas03.foodomain.com"]

These are valid hostnames and when chef runs one of those will be the FQDN, but at the point this is bombing out my node attributes (to pick one of those servers) look like:

{
  "name": "int-kafka-bthomas02.foodomain.com",
  "chef_environment": "Int",
  "run_list": [],
  "normal": {},
  "default": {},
  "override": {},
  "automatic": {}
}

Likely because the recipe refuses to even compile as a result of being unable to lookup those particular attributes, so the compile phase fails. No automatic attributes means this lookup will always fail.

Hacking around this by, for example, setting these attributes manually is not really feasible because of the way the chef provisioner works inside terraform. I suspect this would work fine if I didn't try to have the cookbook in the intitial runlist but that's not going to work either (and not reasonable).

Finally, the error the cookbook emits is a bit misleading. The README (and helper library) specify a number of different attributes that the cookbook can match against, but the error just tells you it failed matching node[fqdn]. While technically true, it implies that's the only type of value (node[fqdn]) allowed in the serverlist, which is not true.

I feel like the easiest workaround here would be to offer a node attribute that disables serverlist verification. So assuming I'm not doing something blindly incorrect here that's causing this, maybe that should be a feature request? Thanks.

bbaugher commented 7 years ago

Ohai and its automatic attributes are provided before compile time otherwise it wouldn't print your FQDN in the error message.

Its not a validation its trying to determine the zookeeper ID of the node which is needed for installation. It tries to be smart by matching your machines host or other identifiers to your configuration so you don't have to configure each server separately with its ID, you just provide the same configuration to all of them and the cookbook figures it out.

Looks like your hostname isn't known to ohai or at least not in the identities we use. Can you find that FQDN value you list (int-kafka-bthomas02.foodomain.com) on the node for some attribute? Are you changing the hostname when you do the chef-client run? Either way the identity logic is hardcoded at this point so I don't see a way to get this working without using a different value for your zookeeper attributes or getting the chef node to have that value for one of the ids.

wolftrouble commented 7 years ago

Ah, I see what happened, and apologies for the confusion - I didn't realize terraform seems to kind of "give up" when it can't configure the first node, so only the first one actually bootstrapped and had all the node attributes. You're right, all the automatic attrs are there.

Unfortunately due to the cleverness of the cookbook I'm pretty much deadlocked (it trying to do that matching rather than allowing those things to be set explicitly means I have to do a bunch of ugliness to set the "right" hostname on boot that I'm not even sure is possible with terraform). We'll find another way to do zk installs. Thanks and again sorry for the confusion.

bbaugher commented 7 years ago

Sorry I couldn't help. If you think of anything that would help feel free to log it and I can take a look