datacratic / StarCluster

StarCluster is a utility for creating and managing computing clusters hosted on Amazon's Elastic Compute Cloud (EC2).
http://star.mit.edu/cluster
GNU Lesser General Public License v3.0
37 stars 13 forks source link

Can't remove nodes because of IOError: [Errno 2] No such file #29

Closed goraxan closed 9 years ago

goraxan commented 9 years ago

In vanilla_improvements every time I try to remove a node it fails and leaves the cluster inconsistent. The error apparently happens when trying to read the known_hosts file from the master. I've tried the default template with defaults AMI. Any thoughts?

The logs are: Remove node001 from mycluster (y/n)? y

Running plugin starcluster.clustersetup.DefaultClusterSetup Removing node node001 (i-897da531)... Removing node001 from known_hosts files /tmp/known_hosts_TYz5fk 100% |||||||||||||||||||||||| Time: 00:00:00 14.58 M/s known_hosts_jcUPX6 100% ||||||||||||||||||||||||||||| Time: 00:00:00 18.58 M/s /tmp/known_hosts_gbmls4 100% |||||||||||||||||||||||| Time: 00:00:00 10.99 M/s known_hosts_hmspWV 100% ||||||||||||||||||||||||||||| Time: 00:00:00 14.36 M/s /tmp/known_hosts_L3BfWd 100% |||||||||||||||||||||||| Time: 00:00:00 15.92 M/s *\ WARNING - src and destination are the same: /root/.ssh/known_hosts, skipping !!! ERROR - Error occured while running plugin 'starcluster.clustersetup.DefaultClusterSetup': Terminating node: node001 (i-897da531)

And the stacktrace is: 2015-10-19 20:13:00,235 PID: 10191 sshutils.py:113 - DEBUG - connecting to host ec2-54-170-160-41.eu-west-1.compute.amazonaws.com on port 22 as user root 2015-10-19 20:13:00,557 PID: 10191 sshutils.py:205 - DEBUG - creating sftp connection 2015-10-19 20:13:00,923 PID: 10191 sshutils.py:214 - DEBUG - creating scp connection 2015-10-19 20:13:01,445 PID: 10191 node.py:690 - WARNING - src and destination are the same: /root/.ssh/known_hosts, skipping 2015-10-19 20:13:01,463 PID: 10191 cluster.py:2043 - DEBUG - Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/cluster.py", line 2033, in run_plugin func(*args) File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/clustersetup.py", line 429, in on_remove_node self._remove_from_known_hosts(node) File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/clustersetup.py", line 419, in _remove_from_known_hosts master.copy_remote_file_to_nodes(target, nodes) File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/node.py", line 678, in copy_remote_file_to_nodes rf = self.ssh.remote_file(remote_file, 'r') File "/usr/local/lib/python2.7/dist-packages/StarCluster-0.9999-py2.7.egg/starcluster/sshutils.py", line 322, in remote_file rfile = self.sftp.open(file, mode) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 327, in open t, msg = self._request(CMD_OPEN, filename, imode, attrblock) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 729, in _request return self._read_response(num) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 776, in _read_response self._convert_status(msg) File "build/bdist.linux-x86_64/egg/paramiko/sftp_client.py", line 802, in _convert_status raise IOError(errno.ENOENT, text) IOError: [Errno 2] No such file

Thanks.

FinchPowers commented 9 years ago

Bug confirmed.

FinchPowers commented 9 years ago

Fixed. Thank you. :)