Kitware / HPCCloud

A Cloud/Web-Based Simulation Environment
https://kitware.github.io/HPCCloud/
Apache License 2.0
50 stars 23 forks source link

Problem with EC2 - "No master node could be found" #628

Open carpemonf-zz opened 5 years ago

carpemonf-zz commented 5 years ago

Hi,

I'm not able to run any workflow using AWS Profiles. I can add a profile in the preferences page and the status reads available. However, when I try to run something using this profile I got the following error in the setup_cluter log:

[14:06:52.933] INFO: Created cluster: 5d1df9ec13a7512240b2d950
[14:06:53.015] INFO: Starting cluster.
[14:06:54.362] ERROR: Exception raise by task.
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 240, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 438, in __protected_call__
    return self.run(*args, **kwargs)
  File "/cumulus/cumulus/taskflow/__init__.py", line 117, in wrapped
    return func(celery_task, *args, **kwargs)
  File "/cumulus/cumulus/taskflow/cluster/__init__.py", line 115, in setup_cluster
    cluster = launch_ec2_cluster(task, cluster, profile)
  File "/cumulus/cumulus/taskflow/cluster/__init__.py", line 279, in launch_ec2_cluster
    launch_params, girder_token, log_write_url, 'running')
  File "/usr/local/lib/python3.6/site-packages/celery/local.py", line 188, in __call__
    return self._get_current_object()(*a, **kw)
  File "/usr/local/lib/python3.6/site-packages/celery/app/trace.py", line 439, in __protected_call__
    return orig(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/celery/app/task.py", line 420, in __call__
    return self.run(*args, **kwargs)
  File "/cumulus/cumulus/ansible/tasks/cluster.py", line 102, in launch_cluster
    master = p.get_master_instance(cluster['_id'])
  File "/cumulus/cumulus/ansible/tasks/providers/ec2.py", line 136, in get_master_instance
    {'Name': 'instance-state-name', 'Values': ['running']}]))
Exception: No master node could be found!

I followed the instructions at https://kitware.github.io/HPCCloud/docs/usage__aws-profiles.html and added the required policies through the IAM Management Console. Please also note that ec2:DescribeRouteTable was unrecognizded, should this be ec2:DescribeRouteTables?

Thanks, Carlos.

patrickoleary commented 5 years ago

Carlos,

How did you deploy this? I see you're using python 3.6. I'm working on completing the move of HPCCloud to supporting python 3.6, but I haven't worked through the aws tasks at this time. I let you know when I finish. If you have any more details that might help me (logs, software versions (OS, Ansible, ...) send them my way.

Thanks, Patrick

carpemonf-zz commented 5 years ago

Thanks Patrick. I'm actually using the containerised solution you recently released (which is very convenient for us and works perfectly fine!). In fact, I will check if EC2 works with my AWS profile in previous VMs and python 2.x.

Since I'm using the Docker version, do you still need any software details? Please let me know. I will also try to provide the logs.

patrickoleary commented 5 years ago

I haven't worked through the aws tasks on our containerized version. I'm so glad you find it useful. We will have Girder 3 upgrade next week, and possibly a updates to the hpccloud-services. Soon we should have a nvidia-based visualization solution in the container stack. So there are going to be a lot of changes in the next month.

We should meet/video-conference soon to see where your development is.

Best regards, Patrick

On Thu, Jul 4, 2019 at 9:42 AM Carlos Peña-Monferrer < notifications@github.com> wrote:

Thanks Patrick. I'm actually using the containerised solution https://github.com/Kitware/hpccloud-services you recently released (which is very convenient for us and works perfectly fine!). In fact, I will check if EC2 works with my AWS profile in previous VMs and python 2.x.

Since I'm using the Docker version, do you still need any software details? Please let me know. I will also try to provide the logs.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Kitware/HPCCloud/issues/628?email_source=notifications&email_token=AAYGDMLTKV2CGDW5HOOXJ6TP5YR57A5CNFSM4H52JEQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZH2NAY#issuecomment-508536451, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYGDMJW4TYSYW4DV34XZYTP5YR57ANCNFSM4H52JEQQ .

carpemonf-zz commented 5 years ago

Many thanks for the updates. It sounds really promising.

We should meet/video-conference soon to see where your development is.

Sure, let's do it.

Best, Carlos.