ClusterLabs / crmsh

Command-line interface for High-Availability cluster management on GNU/Linux systems.
GNU General Public License v2.0
131 stars 94 forks source link

[Question] crm cluster init problem. #130

Open VolodyaIvanets opened 8 years ago

VolodyaIvanets commented 8 years ago

Hello,

Sorry if this is not appropriate place to ask, I'm new to CRMSH.

I'm using CentOS 7 and this http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-7/network:ha-clustering:Stable.repo repository. I used yum to install software: yum install pacemaker corosync crmsh Everything looks to be installed correctly and now I'm following http://crmsh.github.io/start-guide/ tutorial for initial configuration. Unfortunately I'm getting this: [root@node-1 corosync]# crm cluster init -R node-1,node-2,node-3,node-4 INFO: Initialize a new cluster INFO: Nodes: -R, node-1, node-2, node-3, node-4 ERROR: [-R]: Start: Exited with error code 255, Error output: Bad remote forwarding specification '-o' ERROR: [node-4]: Clean: Exited with error code 127, Error output: bash: /tmp/crm-tmp-5706396592ada0f12ba6/crm_clean.py: No such file or directory ERROR: [node-2]: Clean: Exited with error code 127, Error output: bash: /tmp/crm-tmp-5706396592ada0f12ba6/crm_clean.py: No such file or directory ERROR: [node-3]: Clean: Exited with error code 127, Error output: bash: /tmp/crm-tmp-5706396592ada0f12ba6/crm_clean.py: No such file or directory ERROR: [-R]: Clean: Exited with error code 255, Error output: Bad remote forwarding specification '-o' ERROR: cluster.init: Failed to connect to one or more of these hosts via SSH: -R, node-2, node-3, node-4

Each of my nodes has one crm_clean.py file located under /usr/share/crmsh/utils/. It looks for crm_script.debug file that I don't have.

I would appreciate any assistance or if someone will point me to the correct place to post my problem in case here is wrong.

Thanks a lot!

krig commented 8 years ago

Hi!

Your initial problem is a typo in the command line:

 crm cluster init -R node-1,node-2,node-3,node-4

Should be

 crm cluster init nodes=node-1,node-2,node-3,node-4
VolodyaIvanets commented 8 years ago

Krig,

Yeah, sorry for typo. I was acually using -d key for debugging: [root@node-1 ~]# crm cluster init -d node-1,node-2,node-3,node-4

INFO: Initialize a new cluster
INFO: Nodes: -d, node-1, node-2, node-3, node-4
ERROR: [-d]: Start: Exited with error code 255, Error output: unknown option -- d
usage: ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec]
           [-D [bind_address:]port] [-E log_file] [-e escape_char]
           [-F configfile] [-I pkcs11] [-i identity_file]
           [-L [bind_address:]port:host:hostport] [-l login_name] [-m mac_spec]
           [-O ctl_cmd] [-o option] [-p port]
           [-Q cipher | cipher-auth | mac | kex | key]
           [-R [bind_address:]port:host:hostport] [-S ctl_path] [-W host:port]
           [-w local_tun[:remote_tun]] [user@]hostname [command]
ERROR: [-d]: Clean: Exited with error code 255, Error output: unknown option -- d
usage: ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec]
           [-D [bind_address:]port] [-E log_file] [-e escape_char]
           [-F configfile] [-I pkcs11] [-i identity_file]
           [-L [bind_address:]port:host:hostport] [-l login_name] [-m mac_spec]
           [-O ctl_cmd] [-o option] [-p port]
           [-Q cipher | cipher-auth | mac | kex | key]
           [-R [bind_address:]port:host:hostport] [-S ctl_path] [-W host:port]
           [-w local_tun[:remote_tun]] [user@]hostname [command]
ERROR: [node-4]: Clean: Exited with error code 127, Error output: bash: /tmp/crm-tmp-57064669477479dfae50/crm_clean.py: No such file or directory
ERROR: [node-3]: Clean: Exited with error code 127, Error output: bash: /tmp/crm-tmp-57064669477479dfae50/crm_clean.py: No such file or directory
ERROR: [node-2]: Clean: Exited with error code 127, Error output: bash: /tmp/crm-tmp-57064669477479dfae50/crm_clean.py: No such file or directory
ERROR: cluster.init: Failed to connect to one or more of these hosts via SSH: -d, node-2, node-3, node-4

This one shows this much: [root@node-1 ~]# crm cluster init node-1,node-2,node-3,node-4

INFO: Initialize a new cluster
INFO: Nodes: node-1, node-2, node-3, node-4
OK: Configure SSH
Check state of nodes...ERROR: [node-4]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
ERROR: [node-3]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
ERROR: [node-2]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
ERROR: [node-1]: Error (1): [Errno 2] No such file or directory
ERROR: Check state of nodes (rc=False)

Thanks!

VolodyaIvanets commented 8 years ago

Hello again!

I was able to install CRMSH as well as everything else on CentOS 6.7 64 bit.

After modifying: In /usr/share/crmsh/utils/crm_script.py This:

return call(['/usr/bin/systemctl', action, name + '.service'])
return sudo_call(['/usr/bin/systemctl', action, name + '.service'])

To:

return call(['/sbin/service', name, action])
return sudo_call(['/sbin/service', name, action])

crm -d cluster init nodes=node-1,node-2,node-3 (same No such file or directory messages as on CentOS 7 installation):

DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (03fa134)]
DEBUG: found pacemaker version: 1.1.14
INFO: Initialize a new cluster
INFO: Nodes: node-1, node-2, node-3
DEBUG: Local node: ('node-3', None, None), Remote hosts: node-1, node-2
DEBUG: parallax.call([('node-1', None, None), ('node-2', None, None)], mkdir -p /tmp)
DEBUG: parallax.copy([('node-1', None, None), ('node-2', None, None)], /tmp/crm-tmp-5707bd2dc213f854f54c, /tmp/crm-tmp-5707bd2dc213f854f54c)
Configure SSH...
DEBUG: parallax.copy([('node-1', None, None), ('node-2', None, None)], /tmp/crm-tmp-5707bd2dc213f854f54c/script.input, /tmp/crm-tmp-5707bd2dc213f854f54c/script.input)
DEBUG: is_local (None): True
** node-3 - cd "/tmp/crm-tmp-5707bd2dc213f854f54c"; ./configure.py ssh
DEBUG: Result(local): 'true'
OK: Configure SSH
Check state of nodes...
DEBUG: parallax.copy([('node-1', None, None), ('node-2', None, None)], /tmp/crm-tmp-5707bd2dc213f854f54c/script.input, /tmp/crm-tmp-5707bd2dc213f854f54c/script.input)
DEBUG: is_local (all): False
** [('node-1', None, None), ('node-2', None, None)] - cd "/tmp/crm-tmp-5707bd2dc213f854f54c"; ./collect.py
DEBUG: parallax.call([('node-1', None, None), ('node-2', None, None)], cd "/tmp/crm-tmp-5707bd2dc213f854f54c"; ./collect.py)
ERROR: [node-1]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
ERROR: [node-2]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
** node-3 - cd "/tmp/crm-tmp-5707bd2dc213f854f54c"; ./collect.py
ERROR: [node-3]: Error (1): [Errno 2] No such file or directory
ERROR: Check state of nodes (rc=False)
DEBUG: parallax.call([('node-1', None, None), ('node-2', None, None)], if [ -f '/tmp/crm-tmp-5707bd2dc213f854f54c/crm_script.debug' ]; then cat '/tmp/crm-tmp-5707bd2dc213f854f54c/crm_script.debug'; fi)
OK: [node-1]: crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'sshd.service']
OK: [node-2]: crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'sshd.service']
OK: [('node-3', None, None)]: crm_script(call): ['/sbin/service', 'sshd', 'start']
crm_script(call): ['mkdir', '-m', '700', '-p', '/root/.ssh']
crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'sshd.service']

After modifying: In /usr/share/crmsh/scripts/remove/remove.py This: 'systemctl stop corosync.service']) To: 'service corosync stop'])

In /usr/share/crmsh/utils/crm_init.py This:

rc, out, err = crm_script.call(["/usr/bin/systemctl", "is-enabled", "%s.service" % (service)])
rc, out, err = crm_script.call(["/usr/bin/systemctl", "is-active", "%s.service" % (service)])

To:

rc, out, err = crm_script.call(["/sbin/service", "%s", "status" % (service)])
rc, out, err = crm_script.call(["/sbin/service", "%s", "status" % (service)])

... I executed crm -d cluster init nodes=node-1,node-2,node-3 and got output:

DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.14 (03fa134)]
DEBUG: found pacemaker version: 1.1.14
INFO: Initialize a new cluster
INFO: Nodes: node-1, node-2, node-3
DEBUG: Local node: ('node-1', None, None), Remote hosts: node-2, node-3
DEBUG: parallax.call([('node-2', None, None), ('node-3', None, None)], mkdir -p /tmp)
DEBUG: parallax.copy([('node-2', None, None), ('node-3', None, None)], /tmp/crm-tmp-5707c5f1ebe88f702c44, /tmp/crm-tmp-5707c5f1ebe88f702c44)
Configure SSH...
DEBUG: parallax.copy([('node-2', None, None), ('node-3', None, None)], /tmp/crm-tmp-5707c5f1ebe88f702c44/script.input, /tmp/crm-tmp-5707c5f1ebe88f702c44/script.input)
DEBUG: is_local (None): True
** node-1 - cd "/tmp/crm-tmp-5707c5f1ebe88f702c44"; ./configure.py ssh
DEBUG: Result(local): 'true'
OK: Configure SSH
Check state of nodes...
DEBUG: parallax.copy([('node-2', None, None), ('node-3', None, None)], /tmp/crm-tmp-5707c5f1ebe88f702c44/script.input, /tmp/crm-tmp-5707c5f1ebe88f702c44/script.input)
DEBUG: is_local (all): False
** [('node-2', None, None), ('node-3', None, None)] - cd "/tmp/crm-tmp-5707c5f1ebe88f702c44"; ./collect.py
DEBUG: parallax.call([('node-2', None, None), ('node-3', None, None)], cd "/tmp/crm-tmp-5707c5f1ebe88f702c44"; ./collect.py)
ERROR: [node-3]: Remote error: Exited with error code 1, Error output: not all arguments converted during string formatting
ERROR: [node-2]: Remote error: Exited with error code 1, Error output: not all arguments converted during string formatting
** node-1 - cd "/tmp/crm-tmp-5707c5f1ebe88f702c44"; ./collect.py
ERROR: [node-1]: Error (1): not all arguments converted during string formatting
ERROR: Check state of nodes (rc=False)
DEBUG: parallax.call([('node-2', None, None), ('node-3', None, None)], if [ -f '/tmp/crm-tmp-5707c5f1ebe88f702c44/crm_script.debug' ]; then cat '/tmp/crm-tmp-5707c5f1ebe88f702c44/crm_script.debug'; fi)
OK: [node-3]: crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
OK: [node-2]: crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
OK: [('node-1', None, None)]: crm_script(call): ['/sbin/service', 'sshd', 'start']
crm_script(call): ['mkdir', '-m', '700', '-p', '/root/.ssh']
crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
VolodyaIvanets commented 8 years ago

Also here is correct output from CentOS 7 installation:

[root@node-1 ~]# crm -d cluster init nodes=node-1,node-2,node-3,node-4

DEBUG: pacemaker version: [err: ][out: CRM Version: 1.1.13-10.el7_2.2 (44eb2dd)]
DEBUG: found pacemaker version: 1.1.13-10.el7_2.2
INFO: Initialize a new cluster
INFO: Nodes: node-1, node-2, node-3, node-4
DEBUG: Local node: ('node-1', None, None), Remote hosts: node-2, node-3, node-4
DEBUG: parallax.call([('node-2', None, None), ('node-3', None, None), ('node-4', None, None)], mkdir -p /tmp)
DEBUG: parallax.copy([('node-2', None, None), ('node-3', None, None), ('node-4', None, None)], /tmp/crm-tmp-5708d0fd1f52bdb919a1, /tmp/crm-tmp-5708d0fd1f52bdb919a1)
Configure SSH...
DEBUG: parallax.copy([('node-2', None, None), ('node-3', None, None), ('node-4', None, None)], /tmp/crm-tmp-5708d0fd1f52bdb919a1/script.input, /tmp/crm-tmp-5708d0fd1f52bdb919a1/script.input)
DEBUG: is_local (None): True
** node-1 - cd "/tmp/crm-tmp-5708d0fd1f52bdb919a1"; ./configure.py ssh
DEBUG: Result(local): 'true'
OK: Configure SSH
Check state of nodes...
DEBUG: parallax.copy([('node-2', None, None), ('node-3', None, None), ('node-4', None, None)], /tmp/crm-tmp-5708d0fd1f52bdb919a1/script.input, /tmp/crm-tmp-5708d0fd1f52bdb919a1/script.input)
DEBUG: is_local (all): False
** [('node-2', None, None), ('node-3', None, None), ('node-4', None, None)] - cd "/tmp/crm-tmp-5708d0fd1f52bdb919a1"; ./collect.py
DEBUG: parallax.call([('node-2', None, None), ('node-3', None, None), ('node-4', None, None)], cd "/tmp/crm-tmp-5708d0fd1f52bdb919a1"; ./collect.py)
ERROR: [node-4]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
ERROR: [node-3]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
ERROR: [node-2]: Remote error: Exited with error code 1, Error output: [Errno 2] No such file or directory
** node-1 - cd "/tmp/crm-tmp-5708d0fd1f52bdb919a1"; ./collect.py
ERROR: [node-1]: Error (1): [Errno 2] No such file or directory
ERROR: Check state of nodes (rc=False)
DEBUG: parallax.call([('node-2', None, None), ('node-3', None, None), ('node-4', None, None)], if [ -f '/tmp/crm-tmp-5708d0fd1f52bdb919a1/crm_script.debug' ]; then cat '/tmp/crm-tmp-5708d0fd1f52bdb919a1/crm_script.debug'; fi)
OK: [node-4]: crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'ntp.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'hawk.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'SuSEfirewall2_init.service']
crm_script(call): ['netstat', '-nr']
OK: [node-3]: crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'ntp.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'hawk.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'SuSEfirewall2_init.service']
crm_script(call): ['netstat', '-nr']
OK: [node-2]: crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'ntp.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'hawk.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'SuSEfirewall2_init.service']
crm_script(call): ['netstat', '-nr']
OK: [('node-1', None, None)]: crm_script(call): ['/usr/bin/systemctl', 'start', 'sshd.service']
crm_script(call): ['mkdir', '-m', '700', '-p', '/root/.ssh']
crm_script(call): ['./crm_rpmcheck.py', 'booth', 'cluster-glue', 'corosync', 'crmsh', 'csync2', 'drbd', 'fence-agents', 'gfs2', 'gfs2-utils', 'hawk', 'ocfs2', 'ocfs2-tools', 'pacemaker', 'pacemaker-mgmt', 'resource-agents', 'sbd']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'sshd.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'ntp.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'corosync.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-active', 'pacemaker.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'hawk.service']
crm_script(call): ['/usr/bin/systemctl', 'is-enabled', 'SuSEfirewall2_init.service']
crm_script(call): ['netstat', '-nr']

Thanks for helping!

krig commented 8 years ago

Hello!

Please try again with crmsh 2.2.1 which was just released, if you still have issues.

The cluster init scripts are written primarily for SLE / openSUSE since that's the distribution I use, but I welcome any patches to fix any incompatibilities with other distributions.

VolodyaIvanets commented 8 years ago

Hi!

Thank you very much!

liangxin1300 commented 7 years ago

Hi @krig Is the crm->cluster->init command the completely alternative for the ha-cluster-init shell script in SLE12.2 ? My crmsh version is 2.3.2 If it is, en......I think ha-cluster-init is more straightforward, no need more arguments. Why make this change?

krig commented 7 years ago

Hi, the idea with crm cluster init was to provide a similar command to ha-cluster-init which was platform-independent and extensible. However, it was not a successful attempt. I am rewriting ha-cluster-init instead now, in the master branch of crmsh. It then has the same interface as ha-cluster-init but with some additional features like configuring the whole cluster with a single command.