galaxyproject / cloudman

Easily create and manage compute clusters on any Cloud.
https://galaxyproject.org/cloudman/
37 stars 23 forks source link

Only one worker added after requesting to add two #44

Closed hackdna closed 8 years ago

hackdna commented 8 years ago

Using main cloudman bucket. Requested to add two c3.8xlarge workers using the "Add worker nodes" dialog box but only one was added: screenshot 2016-03-15 11 32 32

CM log:

2016-03-15 15:21:07,625 DEBUG         master:1263 Adding 2 c3.8xlarge instance(s)
2016-03-15 15:21:07,626 DEBUG         master:916  Toggling master instance as exec host
2016-03-15 15:21:10,170 DEBUG           misc:840  '/usr/bin/scontrol update NodeName=master Reason="CloudMan-disabled" State=DRAIN' command OK
2016-03-15 15:21:10,170 INFO          master:931  The master instance is set to *not* execute jobs. To manually change this, use the CloudMan Admin panel.
2016-03-15 15:21:10,170 INFO             ec2:431  Adding 2 on-demand instance(s)
2016-03-15 15:21:10,188 DEBUG            ec2:176  Fetched security group ids for the first time: ['sg-8b3b99f3']
2016-03-15 15:21:10,189 DEBUG            ec2:463  Starting instance(s) in VPC with the following command : ec2_conn.run_instances( image_id='ami-d5246abf', min_count='1', max_count='2', key_name='cloudman_key_pair', security_group_ids=['sg-8b3b99f3'], user_data(with sensitive info filtered out)=[static_images_dir: static/images
cluster_templates: [{'filesystem_templates': [{'archive_url': 'http://s3.amazonaws.com/cloudman/fs-archives/galaxyFS-20151202.tar.gz', 'type': u'volume', 'name': 'galaxy', 'roles': 'galaxyTools,galaxyData', 'size': u'10'}, {'snap_id': 'snap-c332f2b0', 'name': 'galaxyIndices', 'roles': 'galaxyIndices'}], 'name': 'Galaxy'}, {'filesystem_templates': [{'name': 'galaxy'}], 'name': 'Data'}]
master_hostname_alt: ip-172-31-26-166
storage_type: volume
iops:
is_secure: True
cluster_storage_type: volume
s3_port: None
log_level: DEBUG
static_cache_time: 360
master_public_ip: 54.165.165.225
cluster_type: Galaxy
initial_cluster_type: Galaxy
static_scripts_dir: static/scripts
debug: true
master_ip: 172.31.26.166
cluster_name: galaxy-dev
machine_image_id: ami-d5246abf
role: worker
bucket_cluster: cm-c6d42a39947226f4727ed6a9c1c1d1fc
boot_script_path: /opt/cloudman/boot
master_hostname: ip-172-31-26-166.ec2.internal
ec2_conn_path: /
region_name: us-east-1
region_endpoint: ec2.amazonaws.com
ec2_port: None
static_favicon_dir: static/favicon.ico
deployment_version: 2
storage_size: 10
use_translogger: False
boot_script_name: cm_boot.py
services: [{'name': 'Supervisor', 'roles': ['Supervisor']}, {'name': 'NodeJSProxy', 'roles': ['NodeJSProxy']}, {'name': 'ProFTPd', 'roles': ['ProFTPd']}, {'name': 'GalaxyReports', 'roles': ['GalaxyReports']}, {'name': 'Slurmd', 'roles': ['Slurmd']}, {'name': 'Nginx', 'roles': ['Nginx']}, {'name': 'PSS', 'roles': ['PSS']}, {'name': 'Slurmctld', 'roles': ['Slurmctld', 'Job manager']}, {'home': '/mnt/galaxy/galaxy-app', 'name': 'Galaxy', 'roles': ['Galaxy']}, {'name': 'Postgres', 'roles': ['Postgres']}]
cloud_type: ec2
custom_image_id:
cloudman_file_name: cm.tar.gz
access_key: <redacted>
global_conf: {'__file__': '/mnt/cm/cm_wsgi.ini', 'here': '/mnt/cm'}
filesystems: [{'kind': 'volume', 'mount_point': '/mnt/galaxy', 'name': 'galaxy', 'roles': ['galaxyTools', 'galaxyData'], 'ids': [u'vol-dabaeb79']}, {'kind': 'snapshot', 'mount_point': '/mnt/galaxyIndices', 'name': 'galaxyIndices', 'roles': ['galaxyIndices'], 'ids': ['snap-0457ec13']}]
placement: us-east-1a
template_path: templates
cloud_name: Amazon - Virginia
static_dir: static
persistent_data_version: 3
cloudman_home: /mnt/cm
static_style_dir: static/style
bucket_default: cloudman
custom_instance_type:
s3_host: s3.amazonaws.com
use_lint: false
s3_conn_path: /
static_enabled: True
worker_initial_count: ], instance_type='c3.8xlarge', placement='us-east-1a', subnet_id='subnet-358d556d')
2016-03-15 15:21:13,925 DEBUG            ec2:389  Adding tag 'clusterName:galaxy-dev' to resource 'i-050c49e4523d1b746'
2016-03-15 15:21:14,058 DEBUG            ec2:389  Adding tag 'role:worker' to resource 'i-050c49e4523d1b746'
2016-03-15 15:21:14,181 DEBUG            ec2:389  Adding tag 'Name:Worker: galaxy-dev' to resource 'i-050c49e4523d1b746'
2016-03-15 15:21:14,328 DEBUG            ec2:510  Adding Instance Instance:i-050c49e4523d1b746
2016-03-15 15:21:14,328 DEBUG            ec2:524  Started 2 instance(s)
2016-03-15 15:21:14,328 DEBUG            ec2:526  Setting boto's logger to INFO mode
2016-03-15 15:21:17,698 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:21:17,956 DEBUG       instance:457  Got public IP for instance i-050c49e4523d1b746: 54.175.86.1
2016-03-15 15:21:17,956 DEBUG         master:2788 Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' has been quiet for a while (last check 3 secs ago); will wait a bit longer before a check...
2016-03-15 15:21:33,562 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:21:33,562 DEBUG         master:2788 Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' has been quiet for a while (last check 19 secs ago); will wait a bit longer before a check...
2016-03-15 15:21:49,073 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:21:49,073 DEBUG         master:2783 Have not heard from or checked on instance 'i-050c49e4523d1b746; 54.175.86.1; w1' for a while; checking now.
2016-03-15 15:21:49,185 DEBUG       instance:339  Requested instance 'i-050c49e4523d1b746; 54.175.86.1; w1' update: old state: running; new state: running
2016-03-15 15:21:49,185 DEBUG       instance:115  'Maintaining' instance 'i-050c49e4523d1b746; 54.175.86.1; w1' in 'running' state (last comm before 15:21:49 | last m_state change before 0:00:34 | time_rebooted before 15:21:49
2016-03-15 15:22:05,230 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:22:05,230 DEBUG         master:2788 Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' has been quiet for a while (last check 16 secs ago); will wait a bit longer before a check...
2016-03-15 15:22:15,783 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:22:15,783 DEBUG         master:2788 Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' has been quiet for a while (last check 26 secs ago); will wait a bit longer before a check...
2016-03-15 15:22:31,230 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:22:31,230 DEBUG         master:2783 Have not heard from or checked on instance 'i-050c49e4523d1b746; 54.175.86.1; w1' for a while; checking now.
2016-03-15 15:22:31,299 DEBUG       instance:339  Requested instance 'i-050c49e4523d1b746; 54.175.86.1; w1' update: old state: running; new state: running
2016-03-15 15:22:31,299 DEBUG       instance:115  'Maintaining' instance 'i-050c49e4523d1b746; 54.175.86.1; w1' in 'running' state (last comm before 15:22:31 | last m_state change before 0:01:16 | time_rebooted before 15:22:31
2016-03-15 15:22:46,805 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:22:46,805 DEBUG         master:2788 Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' has been quiet for a while (last check 15 secs ago); will wait a bit longer before a check...
2016-03-15 15:23:02,447 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:23:02,448 DEBUG         master:2783 Have not heard from or checked on instance 'i-050c49e4523d1b746; 54.175.86.1; w1' for a while; checking now.
2016-03-15 15:23:02,514 DEBUG       instance:339  Requested instance 'i-050c49e4523d1b746; 54.175.86.1; w1' update: old state: running; new state: running
2016-03-15 15:23:02,514 DEBUG       instance:115  'Maintaining' instance 'i-050c49e4523d1b746; 54.175.86.1; w1' in 'running' state (last comm before 15:23:02 | last m_state change before 0:01:48 | time_rebooted before 15:23:02
2016-03-15 15:23:17,978 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:23:17,978 DEBUG         master:2788 Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' has been quiet for a while (last check 15 secs ago); will wait a bit longer before a check...
2016-03-15 15:23:33,487 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:23:33,487 DEBUG         master:2783 Have not heard from or checked on instance 'i-050c49e4523d1b746; 54.175.86.1; w1' for a while; checking now.
2016-03-15 15:23:33,559 DEBUG       instance:339  Requested instance 'i-050c49e4523d1b746; 54.175.86.1; w1' update: old state: running; new state: running
2016-03-15 15:23:33,559 DEBUG       instance:115  'Maintaining' instance 'i-050c49e4523d1b746; 54.175.86.1; w1' in 'running' state (last comm before 15:23:33 | last m_state change before 0:02:19 | time_rebooted before 15:23:33
2016-03-15 15:23:49,187 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:23:49,188 DEBUG         master:2788 Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' has been quiet for a while (last check 15 secs ago); will wait a bit longer before a check...
2016-03-15 15:23:59,189 INFO        instance:534  Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' reported alive
2016-03-15 15:23:59,190 DEBUG       instance:555  INSTANCE_ALIVE private_ip: 172.31.26.72 public_ip: 54.175.86.1 zone: us-east-1a type: c3.8xlarge AMI: ami-d5246abf local_hostname: ip-172-31-26-72.ec2.internal, CPUs: 32, hostname: ip-172-31-26-72
2016-03-15 15:23:59,254 DEBUG           misc:840  'cp /etc/hosts /etc/hosts.orig' command OK
2016-03-15 15:23:59,257 DEBUG           misc:840  'cp /tmp/tmp1XafFZ /etc/hosts' command OK
2016-03-15 15:23:59,261 DEBUG           misc:840  'chmod 644 /etc/hosts' command OK
2016-03-15 15:23:59,261 DEBUG           misc:1081 Added the following line to /etc/hosts: 172.31.26.72 w1 ip-172-31-26-72.ec2.internal ip-172-31-26-72

2016-03-15 15:24:04,822 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:24:09,823 DEBUG       instance:564  Got MOUNT_DONE message
2016-03-15 15:24:09,824 DEBUG       instance:574  Got transient_nfs state on w1: 1
2016-03-15 15:24:09,826 DEBUG         master:2313 Instructing all workers to sync /etc/hosts w/ master
2016-03-15 15:24:09,827 DEBUG       instance:501  Sent master public key to worker instance 'i-050c49e4523d1b746'.
2016-03-15 15:24:09,827 DEBUG       instance:502    MT: Message MASTER_PUBKEY ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCY6gKhVVz9qJ2yKuz+vwNyPlDMvW2jcD4Iolxfi/TruRmk1MLzGdp+bCbEGFoseY8NBy1rfwH7sY0eWmXcp3fM+V2+fMw1fMg3ydz87mtbaEEH7eUE4jtxdAvw9ktg8mRml5ApKGLypi+95SaUEM2sEkkE6zkF9mmhc7IG2+xvrX8XmAXCAcyY4YToLqha7XITm1oHlFYWIPSNW5VZnmQZ1bvQ87RBH6Zyxyrx9FY7hnsW21J4HahzKhQZwbguPMefvrnNBwY3q4C/fvqjltLt37ZEUIp+5HdR9oWq80Ws9w7xDYWN5LfHx8jqn/cvNpgrq8dDn1LZ5ldYMVDHQw6l root@ip-172-31-26-166
 sent to 'i-050c49e4523d1b746'
2016-03-15 15:24:19,831 DEBUG       instance:603  Got WORKER_H_CERT message
2016-03-15 15:24:19,831 DEBUG         master:2284 Saving host certificate 'ip-172-31-26-72.ec2.internal ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDaSRt8IcxRJ9WtvQaQqS6ui9CJkf9SDmbWyDnlqID9hqzfKBVFlM6Ss4DQfcWyL9Hkc5IHtHXugqDZyzsXfL5JA0FlcgPqhUdo08dQ85jcDXUlMio5uPnbk53V3A85Mlu5Fyh32naiDwejtykwwmz4ACXkgF8ZxfpqNxWI/CfwuX4F2dxHdhl/v1iIHGXQBXFh1eMsAFkVskJb2z40Ev9GeACbRTfO0l+93hJmQnuegB799bt0d87NxRo6fPiV+zqiDCTzoemT/DiGEOcEJWO3IRWM8SUoLlzOD4LompDxh8d4g/7/VcRtf0tdn8evxWYs6CCmuv71a6ALyMfnCzlV
 '
2016-03-15 15:24:19,831 DEBUG         master:2285 Saving worker host certificate.
2016-03-15 15:24:19,831 DEBUG       instance:607  Worker 'i-050c49e4523d1b746' host certificate received and appended to /root/.ssh/known_hosts
2016-03-15 15:24:19,831 DEBUG      slurmctld:194  Adding node w1 into Slurm cluster
2016-03-15 15:24:19,831 DEBUG      slurmctld:181  Reconfiguring Slurm cluster
2016-03-15 15:24:19,832 DEBUG      slurmctld:148  Setting up /mnt/transient_nfs/slurm/slurm.conf (attempt 0/5)
2016-03-15 15:24:19,832 DEBUG      slurmctld:124  Setting slurm.conf parameters
2016-03-15 15:24:19,832 DEBUG           misc:983  Checking existence of directory '/tmp/slurm'
2016-03-15 15:24:19,832 DEBUG           misc:995  Directory '/tmp/slurm' exists.
2016-03-15 15:24:19,832 DEBUG      slurmctld:120  Worker node names to include in slurm.conf: w1
2016-03-15 15:24:19,839 DEBUG      slurmctld:152  Created slurm.conf as /mnt/transient_nfs/slurm/slurm.conf
2016-03-15 15:24:19,855 DEBUG           misc:840  '/usr/bin/scontrol reconfigure' command OK
2016-03-15 15:24:19,855 DEBUG       instance:506    MT: Sending START_SLURMD message to instance 'i-050c49e4523d1b746; 54.175.86.1; w1', named w1
2016-03-15 15:24:19,856 WARNING     instance:618  Could not get a handle on job manager service to add node 'i-050c49e4523d1b746; 54.175.86.1; w1'
2016-03-15 15:24:19,856 INFO        instance:625  Waiting on worker instance 'i-050c49e4523d1b746; 54.175.86.1; w1' to configure itself.
2016-03-15 15:24:20,454 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:24:35,457 INFO        instance:628  Instance 'i-050c49e4523d1b746; 54.175.86.1; w1' ready
2016-03-15 15:24:35,457 DEBUG            ec2:389  Adding tag 'clusterName:galaxy-dev' to resource 'i-050c49e4523d1b746'
2016-03-15 15:24:35,539 DEBUG            ec2:389  Adding tag 'role:worker' to resource 'i-050c49e4523d1b746'
2016-03-15 15:24:35,618 DEBUG            ec2:389  Adding tag 'alias:w1' to resource 'i-050c49e4523d1b746'
2016-03-15 15:24:35,746 DEBUG            ec2:389  Adding tag 'Name:Worker: galaxy-dev' to resource 'i-050c49e4523d1b746'
2016-03-15 15:24:36,342 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
2016-03-15 15:24:51,888 DEBUG         master:2761 S&S: AS..Unstarted; ClouderaManager..Unstarted; Cloudgene..Unstarted; Galaxy..OK; GalaxyReports..OK; Migration..Completed; Nginx..OK; NodeJSProxy..OK; PSS..Completed; Postgres..OK; ProFTPd..OK; Pulsar..Unstarted; Slurmctld..OK; Slurmd..OK; Supervisor..OK; galaxy FS..OK; galaxyIndices FS..OK; transient_nfs FS..OK;
dannon commented 8 years ago

When I have seen this before, it was an instance limit on the ec2 side. Can you confirm your c3.8xlarge account limit is high enough to launch an extra?

hackdna commented 8 years ago

It turns out this is due to a very low EC2 instance limit on my account. I've tried adding one more c3.8xlarge instance separately but got this error message in the CM log:

2016-03-15 15:43:33,344 ERROR     connection:1202 400 Bad Request
2016-03-15 15:43:33,344 ERROR     connection:1203 <?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InstanceLimitExceeded</Code><Message>You have requested more instances (2) than your current instance limit of 1 allows for the specified instance type. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.</Message></Error></Errors><RequestID>247ef646-8144-4e37-8fe5-a4a086bf3eb1</RequestID></Response>
2016-03-15 15:43:33,344 ERROR            ec2:514  EC2 response error when starting worker nodes: EC2ResponseError: 400 Bad Request
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>InstanceLimitExceeded</Code><Message>You have requested more instances (2) than your current instance limit of 1 allows for the specified instance type. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.</Message></Error></Errors><RequestID>247ef646-8144-4e37-8fe5-a4a086bf3eb1</RequestID></Response>

It would be great if this were indicated in the web UI during the initial attempt.

dannon commented 8 years ago

Yep, that error should be represented better in the UI.

hackdna commented 8 years ago

Also, nodes that have failed to launch are impossible to remove from the web UI using "Remove worker nodes": screenshot 2016-03-15 13 06 27

afgane commented 8 years ago

They'll disappear automatically after a few minutes (occasionally, a page refresh is necessary).

hackdna commented 8 years ago

OK, thanks. They were in that "blue" state for almost an hour but I didn't try to reload the page and I've just terminated the cluster.

afgane commented 8 years ago

There's no API for querying resource limits for the time being (https://forums.aws.amazon.com/thread.jspa?messageID=709583) so best we could do is to propagate the error message to the popup message but that seems a bit of an overkill? An error message in the info log might be enough?

hackdna commented 8 years ago

Showing the exact error message in the info log would be great.

afgane commented 8 years ago

Actually, I'm not sure that wasn't the case already as all the info, error or critical log messages should be getting included in the info log. I've cleaned it up now so only the message body gets shown, making it at least easier to read.

I'll close this issue for now and if we run into it again, we can reopen.