Closed gshively11 closed 9 years ago
I'm pretty sure this is a race condition, I have it on Ubuntu, Debian, and CentOS, but illustrate a minimum case with Ubuntu in great detail below.
I'm also having this problem with both zookeeper and kafka cookbooks. I thought I solved it by freezing to 1.6.0, but that also seemed to experience this, if slightly less frequently.
I have a wrapper cookbook which spins up a triple of zookeeper / kafka in Vagrant at:
https://github.com/bitmonk/bitmonk_kafka
I also produced a minimal cookbook / vagrant / kitchen environment to illustrate the minimal case at:
https://github.com/bitmonk/bitmonk_runit_wtf/
The output of 'kitchen test' on my mac is in a gist at:
https://gist.github.com/bitmonk/c40729ba454aa94c44b7
I also ran a repeat 'vagrant provision' after the 'vagrant up' failed, which is how I get my zookeeper / kafka triple up, and captured the output in a gist. The command follows:
Justin-Ryans-MacBook-Pro:bitmonk_runit_wtf juryan$ while ! vagrant up; do
vagrant provision; done 2>&1 | tee ~/runit_wtf.txt
As a workaround for now, the following works:
runit_service "myservice" do
# ...
sv_bin 'sleep 5 && /usr/bin/sv'
action :enable
end
I thought about something like that, I have a feeling that isn't going to be accepted in upstream community cookbooks. :/
Haha you'd be surprised! ;)
I've only actually seen this hit us hard when doing our integration tests. I have the sv_bin
command wrapped in 'if testing' block so it's not all that bad but would be nice to get rid of it...
@gshively11 @ashmckenzie @bitmonk can you please try reproducing this issue with current code in develop
branch? I expect this issue is fixed in that pre-release branch.
@cwjohnston I just re-ran the our tests with the develop branch and success! Nice work :)
@ashmckenzie - this has been happening on community zookeeper and kafka cookbooks during 'vagrant up', which is particularly harsh when you're spinning up a triple. :)
@cwjohnston - Thanks, I'll give this a shot today! My example case fails on CentOS, but succeeds on Ubuntu - I'll try this with my actual zookeeper / kafka cookbooks today.
@cwjohnston - it's looking pretty good, is there any reason you prefer to let the current release of 'runit' supercede what's in the 'develop' branch?
I'm trying to get dependent changes in upstream branches, and I prefer that, say, 'depends "runit"' in metadata.rb of a community cookbook be sufficient to pass tests on Ubuntu, Debian, and CentOS.
My kafka and zookeeper stuff are looking pretty OK right now.
Do you have any concerns about regression for other platforms or configurations?
My impression is basically that this code is not a problem for running systems, it just makes me run, 'vagrant provision', a lot, which is a problem for sending this out to my team to encourage them to start logging to kafka.
I worked on one of the oldest legacy alpha sites for Chef at Wikia, and I understand why these problems occur and are difficult to propagate, because I don't want to break anyone's staging or production to fix my vagrant, but I'm trying to see to it that we all have all of those type things working.
Let me know if there's any way I can help to move this along! I'm happy to help cut releases.
+1! Any chance to get this released?
Pinned at 1.5.18 as well. Using Ubuntu 14.04. Been hunting this bug for quite a while. Any chance to get the fix released?
Thanks for all the work here! Cheers! :wink:
@punnie, could you try using my branch from https://github.com/hw-cookbooks/runit/pull/148 ? I've been using it for a while in docker and it works fine. Or do you mean that you use some other solution from this thread and want to have it released?
I'm still pinned to the 'develop' branch, so would be nice to see what's in there released to supermarket with a version bump.
:+1: Seeing this in 1.7.2, would love to see a release so I don't have to pin a SHA
:+1: release with fix will really be great
@cwjohnston would love to offer my time to see a fresh release of the runit cookbook, esp since this issue is now pinned to issues in several other community cookbooks and seems to fairly universally impact folks spinning up new nodes with the latest release cookbook.
Hi folks. @chrisroberts and I have scheduled time this week to work on these and other outstanding issues, with the aim of cutting a new release from develop
very soon. Thanks for your patience!
@cwjohnston has there been any movement on this?
+1 Please release :)
+1
+1
@xmik thanks for the suggestion. I'm currently using version 1.5.x, which appears not to have this problem, or at least works for me™.
If you want I may check that out at a later date and report back, but for now I really need some development speed, and my current solution works. :wink:
@punnie, no rush as I am not in this context right now, but I would appreciate some feedback. I'm still using that branch and it works for me.
Fix in #138, released in v1.7.4
This is still happening with ubuntu 16, 14, with chef 12.14.x, 12.3.1.x . Why it's status is 'fixed' here? This list is what I did:
Running handlers: [2017-04-27T04:48:29+00:00] ERROR: Running exception handlers Running handlers complete [2017-04-27T04:48:29+00:00] ERROR: Exception handlers complete Chef Client failed. 2 resources updated in 01 minutes 23 seconds [2017-04-27T04:48:29+00:00] FATAL: Stacktrace dumped to /opt/opscode/embedded/cookbooks/cache/chef-stacktrace.out [2017-04-27T04:48:30+00:00] FATAL: Mixlib::ShellOut::ShellCommandFailed: execute[/opt/opscode/bin/private-chef-ctl start rabbitmq] (private-chef::rabbitmq line 105) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1' ---- Begin output of /opt/opscode/bin/private-chef-ctl start rabbitmq ---- STDOUT: warning: rabbitmq: unable to open supervise/ok: file does not exist STDERR: ---- End output of /opt/opscode/bin/private-chef-ctl start rabbitmq ---- Ran /opt/opscode/bin/private-chef-ctl start rabbitmq returned 1
I was using runit 1.7.2 and recently started getting
unable to open supervise/ok: file does not exist
errors when the chef-client run got to a step in the logstash cookbook which tried to restart a logstash service configured to use runit. Not sure if these errors started appearing when chef client got updated to 12.4.I'm now pinned at 1.5.18 and haven't seen the error anymore.
Running on Centos 6.6