clearstorydata-cookbooks / apache_spark

A cookbook for installing and configuring Apache Spark
11 stars 33 forks source link

Monit fails to stop/restart spark worker #15

Open cj-saulius-tvarijonas opened 8 years ago

cj-saulius-tvarijonas commented 8 years ago

Spark java processes are running as spark user. But monit configuration for stop is searching for process with user root. stop program "/usr/bin/pkill -u root -f '^(/\S+/)?java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '"

dragisak commented 8 years ago

I am seeing the same.

The problem is here: https://github.com/clearstorydata-cookbooks/apache_spark/blob/master/recipes/spark-standalone-worker.rb#L85-86

runtimee commented 8 years ago

It leads into an infinite loop

             +check process "spark-standalone-master"
             +  matching "^(/\\S+/)?java .* org[.]apache[.]spark[.]deploy[.]master[.]Master "
             +  every 1 cycles
             +
             +  start program "/bin/bash -c '/usr/share/spark/bin/master_runner.sh </dev/null &'"
             +   as uid spark as gid spark
             +
             +  stop program "/usr/bin/pkill  -u spark  -f '^(/\S+/)?java .* org[.]apache[.]spark[.]deploy[.]master[.]Master '"
             +   as uid spark as gid spark
             - change mode from '' to '0644'
             - change owner from '' to 'root'
             - change group from '' to 'root'
           * file[/etc/monit/conf.d/spark-standalone-master.monitrc] action delete (up to date)
           * monit_wrapper_reload_and_wait[spark-standalone-master] action reload_and_wait

           * script[monit-reload] action run
             - execute "bash"  "/tmp/chef-script20160512-1348-1tvwo8j"
           * ruby_block[ensure-monit-is-running-after-reloading-for-spark-standalone-master] action run
             - execute the ruby block ensure-monit-is-running-after-reloading-for-spark-standalone-master
           * ruby_block[wait-for-monit-reload-spark-standalone-master] action run
             - execute the ruby block wait-for-monit-reload-spark-standalone-master
         Recipe: sysctl::default
           * ruby_block[save-sysctl-params] action run
             - execute the ruby block save-sysctl-params
         Recipe: monit-ng::service
           * service[monit] action restart
             - restart service service[monit]
           * service[monit] action start (up to date)
         Recipe: monit-ng::reload
           * ruby_block[conditional-monit-reload] action run
             - execute the ruby block conditional-monit-reload
         Recipe: apache_spark::spark-standalone-master
           * monit_wrapper_service[spark-standalone-master] action start
           Recipe: sysctl::default
             * ruby_block[save-sysctl-params] action run
        - execute the ruby block save-sysctl-params
           Recipe: monit-ng::service
             * service[monit] action restart
        - restart service service[monit]
             * service[monit] action start (up to date)
           Recipe: monit-ng::reload
             * ruby_block[conditional-monit-reload] action run
        - execute the ruby block conditional-monit-reload
           Recipe: apache_spark::spark-standalone-master
             * monit_wrapper_service[spark-standalone-master] action start
             Recipe: sysctl::default
        * ruby_block[save-sysctl-params] action run
          - execute the ruby block save-sysctl-params
             Recipe: monit-ng::service
        * service[monit] action restart
          - restart service service[monit]
        * service[monit] action start (up to date)
             Recipe: monit-ng::reload
        * ruby_block[conditional-monit-reload] action run
          - execute the ruby block conditional-monit-reload
             Recipe: apache_spark::spark-standalone-master
        * monit_wrapper_service[spark-standalone-master] action start
        Recipe: sysctl::default
          * ruby_block[save-sysctl-params] action run
            - execute the ruby block save-sysctl-params
        Recipe: monit-ng::service
          * service[monit] action restart
            - restart service service[monit]
          * service[monit] action start (up to date)
        Recipe: monit-ng::reload
          * ruby_block[conditional-monit-reload] action run
            - execute the ruby block conditional-monit-reload
        Recipe: apache_spark::spark-standalone-master
          * monit_wrapper_service[spark-standalone-master] action start
          Recipe: sysctl::default
            * ruby_block[save-sysctl-params] action run
              - execute the ruby block save-sysctl-params
          Recipe: monit-ng::service
            * service[monit] action restart
              - restart service service[monit]
            * service[monit] action start (up to date)
          Recipe: monit-ng::reload
            * ruby_block[conditional-monit-reload] action run
              - execute the ruby block conditional-monit-reload
          Recipe: apache_spark::spark-standalone-master
            * monit_wrapper_service[spark-standalone-master] action start
            Recipe: sysctl::default
              * ruby_block[save-sysctl-params] action run
                - execute the ruby block save-sysctl-params
            Recipe: monit-ng::service
              * service[monit] action restart
                - restart service service[monit]
              * service[monit] action start (up to date)
            Recipe: monit-ng::reload
              * ruby_block[conditional-monit-reload] action run
                - execute the ruby block conditional-monit-reload
            Recipe: apache_spark::spark-standalone-master
              * monit_wrapper_service[spark-standalone-master] action start
              Recipe: sysctl::default
                * ruby_block[save-sysctl-params] action run
                  - execute the ruby block save-sysctl-params
              Recipe: monit-ng::service
                * service[monit] action restart
                  - restart service service[monit]
                * service[monit] action start (up to date)
              Recipe: monit-ng::reload
                * ruby_block[conditional-monit-reload] action run
                  - execute the ruby block conditional-monit-reload
              Recipe: apache_spark::spark-standalone-master
                * monit_wrapper_service[spark-standalone-master] action start
                Recipe: sysctl::default
                  * ruby_block[save-sysctl-params] action run
                    - execute the ruby block save-sysctl-params
                Recipe: monit-ng::service
                  * service[monit] action restart
                    - restart service service[monit]
                  * service[monit] action start (up to date)
                Recipe: monit-ng::reload
                  * ruby_block[conditional-monit-reload] action run
                    - execute the ruby block conditional-monit-reload
                Recipe: apache_spark::spark-standalone-master
                  * monit_wrapper_service[spark-standalone-master] action start
                  Recipe: sysctl::default
                    * ruby_block[save-sysctl-params] action run
                      - execute the ruby block save-sysctl-params
                  Recipe: monit-ng::service
                    * service[monit] action restart
                      - restart service service[monit]
                    * service[monit] action start (up to date)
                  Recipe: monit-ng::reload
                    * ruby_block[conditional-monit-reload] action run
                      - execute the ruby block conditional-monit-reload
                  Recipe: apache_spark::spark-standalone-master
                    * monit_wrapper_service[spark-standalone-master] action start
                    Recipe: sysctl::default
                      * ruby_block[save-sysctl-params] action run
                        - execute the ruby block save-sysctl-params
                    Recipe: monit-ng::service
                      * service[monit] action restart
                        - restart service service[monit]
                      * service[monit] action start (up to date)
                    Recipe: monit-ng::reload
                      * ruby_block[conditional-monit-reload] action run
                        - execute the ruby block conditional-monit-reload
                    Recipe: apache_spark::spark-standalone-master
                      * monit_wrapper_service[spark-standalone-master] action start
                      Recipe: sysctl::default
                        * ruby_block[save-sysctl-params] action run
                          - execute the ruby block save-sysctl-params
                      Recipe: monit-ng::service
                        * service[monit] action restart
                          - restart service service[monit]
                        * service[monit] action start (up to date)
                      Recipe: monit-ng::reload
                        * ruby_block[conditional-monit-reload] action run
                          - execute the ruby block conditional-monit-reload
                      Recipe: apache_spark::spark-standalone-master
                        * monit_wrapper_service[spark-standalone-master] action start
                        Recipe: sysctl::default
                          * ruby_block[save-sysctl-params] action run
                            - execute the ruby block save-sysctl-params
                        Recipe: monit-ng::service
                          * service[monit] action restart
                            - restart service service[monit]
                          * service[monit] action start (up to date)
                        Recipe: monit-ng::reload
                          * ruby_block[conditional-monit-reload] action run
                            - execute the ruby block conditional-monit-reload
                        Recipe: apache_spark::spark-standalone-master
                          * monit_wrapper_service[spark-standalone-master] action start
                          Recipe: sysctl::default
                            * ruby_block[save-sysctl-params] action run
                              - execute the ruby block save-sysctl-params
                          Recipe: monit-ng::service
                            * service[monit] action restart
                              - restart service service[monit]
                            * service[monit] action start (up to date)
                          Recipe: monit-ng::reload
                            * ruby_block[conditional-monit-reload] action run
                              - execute the ruby block conditional-monit-reload
                          Recipe: apache_spark::spark-standalone-master
                            * monit_wrapper_service[spark-standalone-master] action start
                            Recipe: sysctl::default
                              * ruby_block[save-sysctl-params] action run
                                - execute the ruby block save-sysctl-params
                            Recipe: monit-ng::service
                              * service[monit] action restart
                                - restart service service[monit]
                              * service[monit] action start (up to date)
                            Recipe: monit-ng::reload
                              * ruby_block[conditional-monit-reload] action run
                                - execute the ruby block conditional-monit-reload
                            Recipe: apache_spark::spark-standalone-master
                              * monit_wrapper_service[spark-standalone-master] action start
                              Recipe: sysctl::default
                                * ruby_block[save-sysctl-params] action run
                                  - execute the ruby block save-sysctl-params
                              Recipe: monit-ng::service
                                * service[monit] action restart
                                  - restart service service[monit]
                                * service[monit] action start (up to date)
                              Recipe: monit-ng::reload
                                * ruby_block[conditional-monit-reload] action run
                                  - execute the ruby block conditional-monit-reload
                              Recipe: apache_spark::spark-standalone-master
                                * monit_wrapper_service[spark-standalone-master] action start
                                Recipe: sysctl::default
                                  * ruby_block[save-sysctl-params] action run
                                    - execute the ruby block save-sysctl-params
                                Recipe: monit-ng::service
                                  * service[monit] action restart
                                    - restart service service[monit]
                                  * service[monit] action start (up to date)
                                Recipe: monit-ng::reload
                                  * ruby_block[conditional-monit-reload] action run
                                    - execute the ruby block conditional-monit-reload
                                Recipe: apache_spark::spark-standalone-master
                                  * monit_wrapper_service[spark-standalone-master] action start
                                  Recipe: sysctl::default
                                    * ruby_block[save-sysctl-params] action run
                                      - execute the ruby block save-sysctl-params
                                  Recipe: monit-ng::service
                                    * service[monit] action restart
                                      - restart service service[monit]
                                    * service[monit] action start (up to date)
                                  Recipe: monit-ng::reload
                                    * ruby_block[conditional-monit-reload] action run
                                      - execute the ruby block conditional-monit-reload
                                  Recipe: apache_spark::spark-standalone-master
                                    * monit_wrapper_service[spark-standalone-master] action start
                                    Recipe: sysctl::default
                                      * ruby_block[save-sysctl-params] action run
                                        - execute the ruby block save-sysctl-params
                                    Recipe: monit-ng::service
                                      * service[monit] action restart
                                        - restart service service[monit]
                                      * service[monit] action start (up to date)
                                    Recipe: monit-ng::reload
                                      * ruby_block[conditional-monit-reload] action run
                                        - execute the ruby block conditional-monit-reload
                                    Recipe: apache_spark::spark-standalone-master
                                      * monit_wrapper_service[spark-standalone-master] action start
                                      Recipe: sysctl::default
                                        * ruby_block[save-sysctl-params] action run
                                          - execute the ruby block save-sysctl-params
                                      Recipe: monit-ng::service
                                        * service[monit] action restart
                                          - restart service service[monit]
                                        * service[monit] action start (up to date)
                                      Recipe: monit-ng::reload
                                        * ruby_block[conditional-monit-reload] action run
                                          - execute the ruby block conditional-monit-reload
                                      Recipe: apache_spark::spark-standalone-master
                                        * monit_wrapper_service[spark-standalone-master] action start
                                        Recipe: sysctl::default
                                          * ruby_block[save-sysctl-params] action run
                                            - execute the ruby block save-sysctl-params
                                        Recipe: monit-ng::service
                                          * service[monit] action restart
                                            - restart service service[monit]
                                          * service[monit] action start (up to date)
                                        Recipe: monit-ng::reload
                                          * ruby_block[conditional-monit-reload] action run
                                            - execute the ruby block conditional-monit-reload
                                        Recipe: apache_spark::spark-standalone-master
                                          * monit_wrapper_service[spark-standalone-master] action start
                                          Recipe: sysctl::default
                                            * ruby_block[save-sysctl-params] action run
                                              - execute the ruby block save-sysctl-params
                                          Recipe: monit-ng::service
                                            * service[monit] action restart
                                              - restart service service[monit]
                                            * service[monit] action start (up to date)
                                          Recipe: monit-ng::reload
                                            * ruby_block[conditional-monit-reload] action run
                                              - execute the ruby block conditional-monit-reload
                                          Recipe: apache_spark::spark-standalone-master
                                            * monit_wrapper_service[spark-standalone-master] action start
                                            Recipe: sysctl::default
                                              * ruby_block[save-sysctl-params] action run
                                                - execute the ruby block save-sysctl-params
                                            Recipe: monit-ng::service
                                              * service[monit] action restart
                                                - restart service service[monit]
                                              * service[monit] action start (up to date)
                                            Recipe: monit-ng::reload
                                              * ruby_block[conditional-monit-reload] action run
                                                - execute the ruby block conditional-monit-reload
                                            Recipe: apache_spark::spark-standalone-master
                                              * monit_wrapper_service[spark-standalone-master] action start
                                              Recipe: sysctl::default
                                                * ruby_block[save-sysctl-params] action run
                                                  - execute the ruby block save-sysctl-params
                                              Recipe: monit-ng::service
                                                * service[monit] action restart
                                                  - restart service service[monit]
                                                * service[monit] action start (up to date)
                                              Recipe: monit-ng::reload
                                                * ruby_block[conditional-monit-reload] action run
                                                  - execute the ruby block conditional-monit-reload
                                              Recipe: apache_spark::spark-standalone-master
                                                * monit_wrapper_service[spark-standalone-master] action start
                                                Recipe: sysctl::default
                                                  * ruby_block[save-sysctl-params] action run
                                                    - execute the ruby block save-sysctl-params
                                                Recipe: monit-ng::service
                                                  * service[monit] action restart
                                                    - restart service service[monit]
                                                  * service[monit] action start (up to date)
                                                Recipe: monit-ng::reload
                                                  * ruby_block[conditional-monit-reload] action run
                                                    - execute the ruby block conditional-monit-reload
                                                Recipe: apache_spark::spark-standalone-master
                                                  * monit_wrapper_service[spark-standalone-master] action start
                                                  Recipe: sysctl::default
                                                    * ruby_block[save-sysctl-params] action run
                                                      - execute the ruby block save-sysctl-params
                                                  Recipe: monit-ng::service
                                                    * service[monit] action restart
                                                      - restart service service[monit]
                                                    * service[monit] action start (up to date)
                                                  Recipe: monit-ng::reload
                                                    * ruby_block[conditional-monit-reload] action run
                                                      - execute the ruby block conditional-monit-reload
                                                  Recipe: apache_spark::spark-standalone-master
                                                    * monit_wrapper_service[spark-standalone-master] action start
                                                    Recipe: sysctl::default
                                                      * ruby_block[save-sysctl-params] action run
                                                        - execute the ruby block save-sysctl-params
                                                    Recipe: monit-ng::service
                                                      * service[monit] action restart
                                                        - restart service service[monit]
                                                      * service[monit] action start (up to date)
                                                    Recipe: monit-ng::reload
                                                      * ruby_block[conditional-monit-reload] action run
                                                        - execute the ruby block conditional-monit-reload
                                                    Recipe: apache_spark::spark-standalone-master
                                                      * monit_wrapper_service[spark-standalone-master] action start
                                                      Recipe: sysctl::default
                                                        * ruby_block[save-sysctl-params] action run
                                                          - execute the ruby block save-sysctl-params
                                                      Recipe: monit-ng::service
                                                        * service[monit] action restart
                                                          - restart service service[monit]
                                                        * service[monit] action start (up to date)
                                                      Recipe: monit-ng::reload
                                                        * ruby_block[conditional-monit-reload] action run
                                                          - execute the ruby block conditional-monit-reload
                                                      Recipe: apache_spark::spark-standalone-master
                                                        * monit_wrapper_service[spark-standalone-master] action start
                                                        Recipe: sysctl::default
                                                          * ruby_block[save-sysctl-params] action run
                                                            - execute the ruby block save-sysctl-params
                                                        Recipe: monit-ng::service
dragisak commented 8 years ago

@runtimee I believe that's a different issue. Maybe related to #19

noorul commented 8 years ago

Also for me kitchen test gets into this loop

bundle exec kitchen test

noorul commented 8 years ago

@mbautin Any idea when a fix will be available?

mbautin commented 8 years ago

Cc @jharveysmith @jayceeb

jharveysmith commented 8 years ago

I'll take a look.

noorul commented 8 years ago

@jharveysmith Any update on this. This is actually a blocker for us now? Is this working properly at your end?

jharveysmith commented 8 years ago

Hi, This looks like notifications/subscriptions between monit-ng/monit_wrapper/apache_spark are getting into circular loop. Still tracking down where to fix it.

jharveysmith commented 8 years ago

Looks like the culprit is the notifying_action_wrapper blocks in monit_wrapper_service.

amalakar commented 8 years ago

I see that the monit conf has the following command:

$ cat /etc/monit/conf.d/spark-standalone-worker.conf | grep pkill stop program "/usr/bin/pkill -u root -f '.*java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '"

But the process runs as spark so it should have looked like: /usr/bin/pkill -u spark -f '.*java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '

Not sure if I am missing anything here. Note that I had overriden the kill command

+default['apache_spark']['standalone']['master_cmdline_pattern'] =
+      '.*java .* org[.]apache[.]spark[.]deploy[.]master[.]Master '
+default['apache_spark']['standalone']['worker_cmdline_pattern'] =
+      '.*java .* org[.]apache[.]spark[.]deploy[.]worker[.]Worker '
jharveysmith commented 8 years ago

The latest monit_wrapper cookbook (3.4.0) should fix the restart loop.