In vanilla_improvements every time I try to remove a node it fails and leaves the cluster inconsistent.
The error apparently happens when trying to read the known_hosts file from the master.
I've tried the default template with defaults AMI. Any thoughts?
The logs are:
Remove node001 from mycluster (y/n)? y
Running plugin starcluster.clustersetup.DefaultClusterSetup
Removing node node001 (i-897da531)...
Removing node001 from known_hosts files
/tmp/known_hosts_TYz5fk 100% |||||||||||||||||||||||| Time: 00:00:00 14.58 M/s
known_hosts_jcUPX6 100% ||||||||||||||||||||||||||||| Time: 00:00:00 18.58 M/s
/tmp/known_hosts_gbmls4 100% |||||||||||||||||||||||| Time: 00:00:00 10.99 M/s
known_hosts_hmspWV 100% ||||||||||||||||||||||||||||| Time: 00:00:00 14.36 M/s
/tmp/known_hosts_L3BfWd 100% |||||||||||||||||||||||| Time: 00:00:00 15.92 M/s
*\ WARNING - src and destination are the same: /root/.ssh/known_hosts, skipping
!!! ERROR - Error occured while running plugin 'starcluster.clustersetup.DefaultClusterSetup':
Terminating node: node001 (i-897da531)
And the stacktrace is:
2015-10-19 20:13:00,235 PID: 10191 sshutils.py:113 - DEBUG - connecting to host ec2-54-170-160-41.eu-west-1.compute.amazonaws.com on port 22 as user root
2015-10-19 20:13:00,557 PID: 10191 sshutils.py:205 - DEBUG - creating sftp connection
2015-10-19 20:13:00,923 PID: 10191 sshutils.py:214 - DEBUG - creating scp connection
2015-10-19 20:13:01,445 PID: 10191 node.py:690 - WARNING - src and destination are the same: /root/.ssh/known_hosts, skipping
2015-10-19 20:13:01,463 PID: 10191 cluster.py:2043 - DEBUG - Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 2033, in run_plugin
func(*args)
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/clustersetup.py", line 429, in on_remove_node
self._remove_from_known_hosts(node)
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/clustersetup.py", line 419, in _remove_from_known_hosts
master.copy_remote_file_to_nodes(target, nodes)
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/node.py", line 678, in copy_remote_file_to_nodes
rf = self.ssh.remote_file(remote_file, 'r')
File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils.py", line 322, in remote_file
rfile = self.sftp.open(file, mode)
File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 327, in open
t, msg = self._request(CMD_OPEN, filename, imode, attrblock)
File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 729, in _request
return self._read_response(num)
File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 776, in _read_response
self._convert_status(msg)
File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 802, in _convert_status
raise IOError(errno.ENOENT, text)
IOError: [Errno 2] No such file
In vanilla_improvements every time I try to remove a node it fails and leaves the cluster inconsistent. The error apparently happens when trying to read the known_hosts file from the master. I've tried the default template with defaults AMI. Any thoughts?
The logs are: Remove node001 from mycluster (y/n)? y
And the stacktrace is: 2015-10-19 20:13:00,235 PID: 10191 sshutils.py:113 - DEBUG - connecting to host ec2-54-170-160-41.eu-west-1.compute.amazonaws.com on port 22 as user root 2015-10-19 20:13:00,557 PID: 10191 sshutils.py:205 - DEBUG - creating sftp connection 2015-10-19 20:13:00,923 PID: 10191 sshutils.py:214 - DEBUG - creating scp connection 2015-10-19 20:13:01,445 PID: 10191 node.py:690 - WARNING - src and destination are the same: /root/.ssh/known_hosts, skipping 2015-10-19 20:13:01,463 PID: 10191 cluster.py:2043 - DEBUG - Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 2033, in run_plugin func(*args) File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/clustersetup.py", line 429, in on_remove_node self._remove_from_known_hosts(node) File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/clustersetup.py", line 419, in _remove_from_known_hosts master.copy_remote_file_to_nodes(target, nodes) File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/node.py", line 678, in copy_remote_file_to_nodes rf = self.ssh.remote_file(remote_file, 'r') File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils.py", line 322, in remote_file rfile = self.sftp.open(file, mode) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 327, in open t, msg = self._request(CMD_OPEN, filename, imode, attrblock) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 729, in _request return self._read_response(num) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 776, in _read_response self._convert_status(msg) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 802, in _convert_status raise IOError(errno.ENOENT, text) IOError: [Errno 2] No such file
Thanks.