Closed sibirajal closed 6 years ago
What's your OS?
Can you run the same bash scripts manually and then try running via st2
and compare output/result? Is there a difference in output/rc between manual run and st2 run?
I suspect this is a generic error related to ntpd
service itself and there are a bunch of related issues could be found in search, just a few:
Please additionally check your server logs for more detailed error messages why ntpd failed.
I am running the ntpd start command in Centos 6.7 OS.
I've tried the same service start with other services and the problem is same.
The issue is getting fixed if I add few more commands after the service start command as below:
#!/bin/bash
/sbin/service exim start
/sbin/service exim status
exit 0
This can be replicated always with any of the service command in the action script.
So if you do service start
from the StackStorm, - it crashes any target process/service?
Or are you saying that Action performing service start
reports non-zero exit status code?
Could you run the following on the target machine: A)
sudo service ntpd start; echo $?
su -c 'sudo service ntpd start; echo $?' admin
And the same command via StackStorm itself: B)
st2 run core.remote_sudo cmd='service ntpd start' private_key=/home/admin/.ssh/id_rsa username=admin passphrase=replace_with_your_passphrase hosts=replace_with_your_remote_host
and post here the output to understand the issue better.
The problem appears to be with the remote shell actions and it works when I start it manually. It didn't start when I start via Stackstorm remote_sudo action. However the same start command works when I add subsequent command after restart command.
It looks to me a bug with the remote command execution action especially only for starting the service.
Manual method:
$ sudo service ntpd start; echo $?
Starting ntpd: [ OK ]
0
$ sudo service ntpd status
ntpd (pid 28409) is running...
Stackstorm method:
$ st2 run core.remote_sudo cmd='/sbin/service ntpd start' private_key=/home/admin/.ssh/id_rsa username=admin passphrase="{{ st2kv.system.admin_passphrase | decrypt_kv}}" hosts='172.16.15.4'
.
id: 59bfcb3a2d0549041063f74b
status: succeeded
parameters:
cmd: /sbin/service ntpd start
hosts: 172.16.15.4
passphrase: '********'
private_key: '********'
username: admin
result:
172.16.15.4:
failed: false
return_code: 0
stderr: ''
" stdout: "Starting ntpd: \e[60G[\e[0;32m OK \e[0;39m]
succeeded: true
Adding another subsequent command works:
$ st2 run core.remote_sudo cmd='/sbin/service ntpd start;/sbin/service ntpd status' private_key=/home/admin/.ssh/id_rsa username=admin passphrase="{{ st2kv.system.admin_passphrase | decrypt_kv}}" hosts='172.16.15.4'
.
id: 59bfcbc92d0549041063f757
status: succeeded
parameters:
cmd: /sbin/service ntpd start;/sbin/service ntpd status
hosts: 172.16.15.4
passphrase: '********'
private_key: '********'
username: admin
result:
172.16.15.4:
failed: false
return_code: 0
stderr: ''
stdout: "Starting ntpd: \e[60G[\e[0;32m OK \e[0;39m]
ntpd (pid 808) is running..."
succeeded: true
Trying to replicate, I created 2 CentOS 6.9
Vagrant VMs: one with currently latest st2 2.4.1
installation, another one is target for testing remote_sudo
command.
I created admin
users on both st2 machine and remote machine, respecting instructions from https://docs.stackstorm.com/install/deb.html#configure-ssh-and-sudo about configuring paswordless sudo and removing requiretty
.
I also encrypted private key with the passphrase trying to replicate your setup.
Now running remote_sudo
commands on target CentOS6 box:
ntpd stop
with remote_sudo
$ st2 run core.remote_sudo cmd='/sbin/service ntpd stop' hosts=192.168.10.130 private_key=/home/admin/.ssh/admin_rsa username=admin passphrase=123456
.
id: 59c5305d5d698e0d8b30fbb9
status: succeeded
parameters:
cmd: /sbin/service ntpd stop
hosts: 192.168.10.130
passphrase: '********'
private_key: '********'
username: admin
result:
192.168.10.130:
failed: false
return_code: 0
stderr: ''
stdout: "Shutting down ntpd: \e[60G[\e[0;32m OK \e[0;39m]\r"
succeeded: true
ntpd status
with remote_sudo
$ st2 run core.remote_sudo cmd='/sbin/service ntpd status' hosts=192.168.10.130 private_key=/home/admin/.ssh/admin_rsa username=admin passphrase=123456
.
id: 59c530645d698e0d8b30fbbc
status: failed
parameters:
cmd: /sbin/service ntpd status
hosts: 192.168.10.130
passphrase: '********'
private_key: '********'
username: admin
result:
192.168.10.130:
failed: true
return_code: 3
stderr: ''
stdout: ntpd is stopped
succeeded: false
ntpd start
when service is stopped$ st2 run core.remote_sudo cmd='/sbin/service ntpd start' hosts=192.168.10.130 private_key=/home/admin/.ssh/admin_rsa username=admin passphrase=123456
.
id: 59c530985d698e0d8b30fbbf
status: succeeded
parameters:
cmd: /sbin/service ntpd start
hosts: 192.168.10.130
passphrase: '********'
private_key: '********'
username: admin
result:
192.168.10.130:
failed: false
return_code: 0
stderr: ''
stdout: "Starting ntpd: \e[60G[\e[0;32m OK \e[0;39m]\r"
succeeded: true
ntpd status
after start in previous command now reports running ntpd
service$ st2 run core.remote_sudo cmd='/sbin/service ntpd status' hosts=192.168.10.130 private_key=/home/admin/.ssh/admin_rsa username=admin passphrase=123456
.
id: 59c530ac5d698e0d8b30fbc2
status: succeeded
parameters:
cmd: /sbin/service ntpd status
hosts: 192.168.10.130
passphrase: '********'
private_key: '********'
username: admin
result:
192.168.10.130:
failed: false
return_code: 0
stderr: ''
stdout: ntpd (pid 8742) is running...
succeeded: true
So in my setup ntpd
is running fine after doing service start
.
/var/log/messages
during the execution:
Sep 22 15:57:27 localhost ntpd[8742]: ntpd exiting on signal 15
Sep 22 15:57:53 localhost ntpd[8911]: ntpd 4.2.6p5@1.2349-o Mon Feb 6 07:22:46 UTC 2017 (1)
Sep 22 15:57:53 localhost ntpd[8912]: proto: precision = 0.178 usec
Sep 22 15:57:53 localhost ntpd[8912]: 0.0.0.0 c01d 0d kern kernel time sync enabled
Sep 22 15:57:53 localhost ntpd[8912]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listen and drop on 1 v6wildcard :: UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listen normally on 2 lo 127.0.0.1 UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listen normally on 3 eth0 10.0.2.15 UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listen normally on 4 eth1 192.168.10.130 UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listen normally on 5 eth1 fe80::a00:27ff:fe32:747b UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listen normally on 6 lo ::1 UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listen normally on 7 eth0 fe80::5054:ff:fe1c:c046 UDP 123
Sep 22 15:57:53 localhost ntpd[8912]: Listening on routing socket on fd #24 for interface updates
Sep 22 15:57:53 localhost ntpd[8912]: 0.0.0.0 c016 06 restart
Sep 22 15:57:53 localhost ntpd[8912]: 0.0.0.0 c012 02 freq_set kernel 11.841 PPM
Sep 22 15:58:00 localhost ntpd[8912]: 0.0.0.0 c615 05 clock_sync
ntpd version:
$ ntpd --version
ntpd 4.2.6p5
ntpd 4.2.6p5@1.2349-o Mon Feb 6 07:22:46 UTC 2017 (1)
CentOS6 version:
$ cat /etc/centos-release
CentOS release 6.9 (Final)
StackStorm version:
$ st2 --version
st2 2.4.1
In my setup I couldn't reproduce your issue and could start services on another node with remote_sudo
and they didn't crash and were running. StackStorm here shouldn't be different than just running a command via ssh on a remote host.
From another point, older systems like CentOS 6.7 might have their own bugs or you might have some non-standard OS/box configuration which could affect the remote execution in a strange way.
So if you could debug deeper and find something more interesting in logs or setup, - please share with us.
Closing this.
Please re-open issue if you experience the same behavior in future and have more detailed info.
Hello Team,
I have created 2 actions in my workflow to check the ntpd service status and it starts the service if the service is not running.
It appears that when the restart action only contains single command "/sbin/service ntpd restart" and the service is getting terminated abruptly. If I add few more commands after the restart command then there is no problem with the action.
Can you please take a look at this bug and provide a fix?
Action for restart the service:
$ cat ntp_admin_restart.yaml
Script for above action:
Action for check the service status:
ntp_check.yaml
cat workflows/dmin_ntp_workflow.yaml
After above execution: