Open cj-saulius-tvarijonas opened 8 years ago
I am seeing the same.
The problem is here: https://github.com/clearstorydata-cookbooks/apache_spark/blob/master/recipes/spark-standalone-worker.rb#L85-86
It leads into an infinite loop
+check process "spark-standalone-master"
+ matching "^(/\\S+/)?java .* org[.]apache[.]spark[.]deploy[.]master[.]Master "
+ every 1 cycles
+
+ start program "/bin/bash -c '/usr/share/spark/bin/master_runner.sh </dev/null &'"
+ as uid spark as gid spark
+
+ stop program "/usr/bin/pkill -u spark -f '^(/\S+/)?java .* org[.]apache[.]spark[.]deploy[.]master[.]Master '"
+ as uid spark as gid spark
- change mode from '' to '0644'
- change owner from '' to 'root'
- change group from '' to 'root'
* file[/etc/monit/conf.d/spark-standalone-master.monitrc] action delete (up to date)
* monit_wrapper_reload_and_wait[spark-standalone-master] action reload_and_wait
* script[monit-reload] action run
- execute "bash" "/tmp/chef-script20160512-1348-1tvwo8j"
* ruby_block[ensure-monit-is-running-after-reloading-for-spark-standalone-master] action run
- execute the ruby block ensure-monit-is-running-after-reloading-for-spark-standalone-master
* ruby_block[wait-for-monit-reload-spark-standalone-master] action run
- execute the ruby block wait-for-monit-reload-spark-standalone-master
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
* service[monit] action restart
- restart service service[monit]
* service[monit] action start (up to date)
Recipe: monit-ng::reload
* ruby_block[conditional-monit-reload] action run
- execute the ruby block conditional-monit-reload
Recipe: apache_spark::spark-standalone-master
* monit_wrapper_service[spark-standalone-master] action start
Recipe: sysctl::default
* ruby_block[save-sysctl-params] action run
- execute the ruby block save-sysctl-params
Recipe: monit-ng::service
@runtimee I believe that's a different issue. Maybe related to #19
Also for me kitchen test gets into this loop
bundle exec kitchen test
@mbautin Any idea when a fix will be available?
Cc @jharveysmith @jayceeb
I'll take a look.
@jharveysmith Any update on this. This is actually a blocker for us now? Is this working properly at your end?
Hi, This looks like notifications/subscriptions between monit-ng/monit_wrapper/apache_spark are getting into circular loop. Still tracking down where to fix it.
Looks like the culprit is the notifying_action_wrapper blocks in monit_wrapper_service.
I see that the monit conf has the following command:
$ cat /etc/monit/conf.d/spark-standalone-worker.conf | grep pkill stop program "/usr/bin/pkill -u root -f '.*java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '"
But the process runs as spark
so it should have looked like:
/usr/bin/pkill -u spark -f '.*java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '
Not sure if I am missing anything here. Note that I had overriden the kill command
+default['apache_spark']['standalone']['master_cmdline_pattern'] =
+ '.*java .* org[.]apache[.]spark[.]deploy[.]master[.]Master '
+default['apache_spark']['standalone']['worker_cmdline_pattern'] =
+ '.*java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '
The latest monit_wrapper cookbook (3.4.0) should fix the restart loop.
Spark java processes are running as spark user. But monit configuration for stop is searching for process with user root.
stop program "/usr/bin/pkill -u root -f '^(/\S+/)?java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '"