salve nodes not started on re-start

itsmeccr commented 7 years ago

I launched a cluster with 2 slave nodes. I ran spark-ec2 stop cluster_name command which stopped master and terminated the spot slave instances. Now, I again tried to restart the cluster but got following error.

Found 1 master, 0 slaves.
Starting slaves...
Starting master...
Waiting for cluster to enter 'ssh-ready' state..........
Cluster is now in 'ssh-ready' state. Waited 241 seconds.
Traceback (most recent call last):
  File "./spark_ec2.py", line 1528, in <module>
    main()
  File "./spark_ec2.py", line 1520, in main
    real_main()
  File "./spark_ec2.py", line 1503, in real_main
    existing_slave_type = slave_nodes[0].instance_type
IndexError: list index out of range

What is causing this and what is the solution?

shivaram commented 7 years ago

We dont restart slave nodes in the case of spot instances. That is only supported for on-demand instances that have been stopped. You can use the flag --use-existing-master in launch and give the same cluster name. That will re-bid for spot instance slaves and then connect them to the stopped master

itsmeccr commented 7 years ago

Thank you.

amplab / spark-ec2

salve nodes not started on re-start #93