basho-labs / riak-cs-chef-cookbook

Basho Riak CS Chef Cookbook
Apache License 2.0
12 stars 15 forks source link

Stanchion startup order appears nondeterministic wrt Riak [JIRA: TOOLS-76] #58

Open slfritchie opened 9 years ago

slfritchie commented 9 years ago

Howdy. Stanchion won't start if it can't make a PB connection to Riak. It looks as if the startup order non-deterministic: I can clearly see that many times (but not all) stanchion's init.d service start happens before Riak is started.

If I remove the :start action from the end of recipes/stanchion.rb and also remove all mentions of notifies :restart, "service[stanchion]" in same, then there's less likely to be a problem. (How much less isn't clear to me, sorry, I'm a Chef newbie.)

If I make the changes mentioned above in forked & edited repos under my own github account, then I get yet another fun thing: Chef failing because the checksum of /etc/stanchion/app.config, which Chef has of course just edited, doesn't match some magic-unknown-to-me value. It would be super-cool to avoid this checksum nonsense ... though at least the Admin

==> riak1: [2015-02-17T01:42:25+00:00] INFO: service[riak-cs-control] restarted
==> riak1: [2015-02-17T01:42:25+00:00] INFO: Retrying execution of ruby_block[create-admin-user], 4 attempt(s) left
==> riak1: [2015-02-17T01:42:28+00:00] INFO: Riak CS Key: EZG0LOA__DV51ZCAYFTQ
==> riak1: [2015-02-17T01:42:28+00:00] INFO: Riak CS Secret: TKVFA1m2XWM0tPHX40dhkqZ74G3Jg0-k_JDfBA==
==> riak1: [2015-02-17T01:42:28+00:00] INFO: ruby_block[create-admin-user] called
==> riak1: [2015-02-17T01:42:28+00:00] INFO: ruby_block[create-admin-user] sending create action to file[/etc/stanchion/app.config] (immediate)
==> riak1: 
==> riak1: ================================================================================
==> riak1: Error executing action `create` on resource 'file[/etc/stanchion/app.config]'
==> riak1: ================================================================================
==> riak1: 
==> riak1: 
==> riak1: Chef::Exceptions::ChecksumMismatch
==> riak1: ----------------------------------
==> riak1: Checksum on resource (1118a6) does not match checksum on content (8c2707)
==> riak1: 
==> riak1: 
==> riak1: Resource Declaration:
==> riak1: ---------------------
==> riak1: # In /tmp/vagrant-chef/f14c076f28361e3af5488af2a2d0affd/cookbooks/riak-cs/recipes/stanchion.rb
==> riak1: 
==> riak1: 
==> riak1: 
==> riak1:  77: file "#{node['stanchion']['package']['config_dir']}/app.config" do
==> riak1: 
==> riak1:  78:   content Eth::Config.new(node['stanchion']['config'].to_hash).pp
==> riak1: 
==> riak1:  79:   owner "root"
==> riak1: 
==> riak1:  80:   mode 0644
==> riak1: 
==> riak1:  81:   notifies :restart, "service[stanchion]"
==> riak1: 
==> riak1:  82: end
==> riak1: 
==> riak1:  83: 
==> riak1: 
==> riak1: 
==> riak1: 
==> riak1: Compiled Resource:
==> riak1: ------------------
==> riak1: # Declared in /tmp/vagrant-chef/f14c076f28361e3af5488af2a2d0affd/cookbooks/riak-cs/recipes/stanchion.rb:77:in `from_file'
==> riak1: 
==> riak1: 
==> riak1: 
==> riak1: file("/etc/stanchion/app.config") do
==> riak1: 
==> riak1:   action "create"
==> riak1:   updated true
==> riak1:   retries 0
==> riak1:   retry_delay 2
==> riak1:   default_guard_interpreter :default
==> riak1:   path "/etc/stanchion/app.config"
==> riak1:   backup 5
==> riak1:   atomic_update true
==> riak1:   diff "... huge blob omitted -SLF..."
==> riak1:   declared_type :file
==> riak1:   cookbook_name :"riak-cs"
==> riak1:   recipe_name "stanchion"
==> riak1:   content "[\n\t{lager, [\n\t\t{crash_log, \"/var/log/stanchion/crash.log\"},\n\t\t{crash_log_count, 5},\n\t\t{crash_log_date, \"$D0\"},\n\t\t{crash_log_msg_size, 65536},\n\t\t{crash_log_size, 10485760},\n\t\t{error_logger_redirect, true},\n\t\t{handlers, [\n\t\t\t{lager_file_backend, [\n\t\t\t\t{\"/var/log/stanchion/error.log\", error, 10485760, \"$D0\", 5},\n\t\t\t\t{\"/var/log/stanchion/console.log\", info, 10485760, \"$D0\", 5}\n\t\t\t]}\n\t\t]}\n\t]},\n\t{sasl, [\n\t\t{sasl_error_logger, false},\n\t\t{utc_log, true}\n\t]},\n\t{stanchion, [\n\t\t{admin_key, \"EZG0LOA__DV51ZCAYFTQ\"},\n\t\t{admin_secret, \"TKVFA1m2XWM0tPHX40dhkqZ74G3Jg0-k_JDfBA==\"},\n\t\t{auth_bypass, false},\n\t\t{riak_ip, \"127.0.0.1\"},\n\t\t{riak_pb_port, 8087},\n\t\t{stanchion_ip, \"33.33.33.10\"},\n\t\t{stanchion_port, 8085}\n\t]}\n]."
==> riak1:   owner "root"
==> riak1:   mode "0644"
==> riak1:   group "riak"
==> riak1:   checksum "1118a643b968c2693258e86018a205beb37a6ccc0cf5e8633b84021d197ac9d7"
==> riak1: end
==> riak1: 
hectcastro commented 9 years ago

Hey @slfritchie, I had my hand in a lot of this not too long ago. Going to make an effort to get this setup running locally so that I can try to reproduce what you're seeing. Hoping to get to it sometime this week, but it most likely won't be until the weekend.

cheeseplus commented 9 years ago

I've confirmed this behaviour with your branch as well as develop. Chef 12 introduced some new behaviour with how the contents of some resources, notably the file resource, are validated. That said, I'm not sure what the easy fix is just yet but I've got some ideas.

slfritchie commented 9 years ago

@hectcastro Howdy! No worries for you or @cheeseplus, thanks for taking a peek. I've nearly no idea how to drive that thing.

By any chance is there an earlier version of the ChefDK that would play nicely with the cookbook thingies available today?

cheeseplus commented 9 years ago

@slfritchie I think 0.3.5 is the latest ChefDK that still was at Chef 11 so give that a shot

juanotto commented 9 years ago

I'm having the same error slfritchie got but without doing any modification, tried with chef 0.3.4, 0.3.5 0.3.6 and 4.0, so it seems that does not make the difference. My VBox is 4.3.20 and vagrant 1.7.2 on Mac OS 10.10.2. Any ideas on how to solve this?

==> riak1: 
==> riak1: ================================================================================
==> riak1: Error executing action `create` on resource 'file[/etc/stanchion/app.config]'
==> riak1: ================================================================================
==> riak1: 
==> riak1: 
==> riak1: Chef::Exceptions::ChecksumMismatch
==> riak1: ----------------------------------
==> riak1: Checksum on resource (1118a6) does not match checksum on content (ac8c6c)
==> riak1: 
==> riak1: 
==> riak1: Resource Declaration:
==> riak1: ---------------------
==> riak1: # In /tmp/vagrant-chef/cad78b6822882fe2cea79acb780b38c3/cookbooks/riak-cs/recipes/stanchion.rb
==> riak1: 
==> riak1:  77: file "#{node['stanchion']['package']['config_dir']}/app.config" do
==> riak1:  78:   content Eth::Config.new(node['stanchion']['config'].to_hash).pp
==> riak1:  79:   owner "root"
==> riak1:  80:   mode 0644
==> riak1:  81:   notifies :restart, "service[stanchion]"
==> riak1:  82: end
==> riak1:  83: 
==> riak1: 
==> riak1: Compiled Resource:
==> riak1: ------------------
==> riak1: # Declared in /tmp/vagrant-chef/cad78b6822882fe2cea79acb780b38c3/cookbooks/riak-cs/recipes/stanchion.rb:77:in `from_file'
==> riak1: 
==> riak1: file("/etc/stanchion/app.config") do
==> riak1:   action "create"
==> riak1:   updated true
==> riak1:   retries 0
==> riak1:   retry_delay 2
==> riak1:   default_guard_interpreter :default
==> riak1:   path "/etc/stanchion/app.config"
==> riak1:   backup 5
==> riak1:   atomic_update true
==> riak1:   diff "--- 

...and then it tries to solve the delayed notifications and fails.

slfritchie commented 9 years ago

Hi, sorry, @juanotto, I've been working on other projects and haven't had time to return to this.

@kuenishi or @ksauzz, have either of you done work recently with CS & Chef?

cheeseplus commented 9 years ago

I've been slammed with travel and conference season but I'm hoping to be able to dedicate some time in the next two weeks to fixing this up and updating the cookbook.

juanotto commented 9 years ago

No problem @slfritchie, I got something working with fakes3, that is enough for the tests I have. Would be great to have this working and be able to test riak-cs easily anyway. Thanks for your work

slfritchie commented 9 years ago

Much obliged, Seth, 乙!