Kitware / HPCCloud-deploy

VM Deploy for HPC-Cloud
Apache License 2.0
18 stars 4 forks source link

Several issues in the Ansible playbook when deploying #96

Closed felixveysseyre closed 7 years ago

felixveysseyre commented 7 years ago

Hi,

I am trying to deploy HPCCloud on my local computer (Mac OS, 10.11) in order to test it.

When run vagrant to create and run the virtual machine, I got several errors.

Here is the command I used to create the virtual machine: $ DEMO=1 vagrant up

Here are the errors I got:

TASK [master : Add exec hosts] *************************************************
task path: /private/var/folders/4y/mqldj0_n2gq1mxwg2z61t5l00000gn/T/d20170531-9446-wdrskt/cumulus/cumulus/ansible/tasks/playbooks/gridengine/roles/master/tasks/main.yml:60
failed: [hpccloud-vm] (item=hpccloud-vm) => {"changed": true, "cmd": "qconf -Me /tmp/exec_host_hpccloud-vm", "delta": "0:00:00.006585", "end": "2017-05-31 15:17:08.252254", "failed": true, "item": "hpccloud-vm", "rc": 1, "start": "2017-05-31 15:17:08.245669", "stderr": "denied: exechost \"vagrant-ubuntu-trusty-64\" does not exist", "stderr_lines": ["denied: exechost \"vagrant-ubuntu-trusty-64\" does not exist"], "stdout": "", "stdout_lines": []}
...ignoring
TASK [master : Check if @allhosts host group exists] ***************************
task path: /private/var/folders/4y/mqldj0_n2gq1mxwg2z61t5l00000gn/T/d20170531-9446-wdrskt/cumulus/cumulus/ansible/tasks/playbooks/gridengine/roles/master/tasks/main.yml:76
fatal: [hpccloud-vm]: FAILED! => {"changed": true, "cmd": "qconf -shgrp @allhosts", "delta": "0:00:00.007938", "end": "2017-05-31 15:17:08.781973", "failed": true, "rc": 1, "start": "2017-05-31 15:17:08.774035", "stderr": "Host group \"@allhosts\" does not exist", "stderr_lines": ["Host group \"@allhosts\" does not exist"], "stdout": "", "stdout_lines": []}
...ignoring
TASK [master : Check if master host group exists] ******************************
task path: /private/var/folders/4y/mqldj0_n2gq1mxwg2z61t5l00000gn/T/d20170531-9446-wdrskt/cumulus/cumulus/ansible/tasks/playbooks/gridengine/roles/master/tasks/main.yml:105
fatal: [hpccloud-vm]: FAILED! => {"changed": true, "cmd": "qconf -shgrp @master", "delta": "0:00:00.007297", "end": "2017-05-31 15:17:10.298893", "failed": true, "rc": 1, "start": "2017-05-31 15:17:10.291596", "stderr": "Host group \"@master\" does not exist", "stderr_lines": ["Host group \"@master\" does not exist"], "stdout": "", "stdout_lines": []}
...ignoring
TASK [master : Check if parallel environment exists] ***************************
task path: /private/var/folders/4y/mqldj0_n2gq1mxwg2z61t5l00000gn/T/d20170531-9446-wdrskt/cumulus/cumulus/ansible/tasks/playbooks/gridengine/roles/master/tasks/main.yml:134
fatal: [hpccloud-vm]: FAILED! => {"changed": true, "cmd": "qconf -sp orte", "delta": "0:00:00.007066", "end": "2017-05-31 15:17:11.781761", "failed": true, "rc": 1, "start": "2017-05-31 15:17:11.774695", "stderr": "orte is not a parallel environment", "stderr_lines": ["orte is not a parallel environment"], "stdout": "", "stdout_lines": []}
...ignoring
TASK [master : Check if all.q exists] ******************************************
task path: /private/var/folders/4y/mqldj0_n2gq1mxwg2z61t5l00000gn/T/d20170531-9446-wdrskt/cumulus/cumulus/ansible/tasks/playbooks/gridengine/roles/master/tasks/main.yml:169
fatal: [hpccloud-vm]: FAILED! => {"changed": true, "cmd": "qconf -sq all.q", "delta": "0:00:00.006355", "end": "2017-05-31 15:17:13.733573", "failed": true, "rc": 1, "start": "2017-05-31 15:17:13.727218", "stderr": "No cluster queue or queue instance matches the phrase \"all.q\"", "stderr_lines": ["No cluster queue or queue instance matches the phrase \"all.q\""], "stdout": "", "stdout_lines": []}
...ignoring
TASK [master : Add all host keys to known_hosts] *******************************
task path: /private/var/folders/4y/mqldj0_n2gq1mxwg2z61t5l00000gn/T/d20170531-9446-wdrskt/cumulus/cumulus/ansible/tasks/playbooks/gridengine/roles/master/tasks/main.yml:234
failed: [hpccloud-vm] (item=hosts.stdout.split(' ')) => {"changed": true, "cmd": "ssh-keyscan -H hosts.stdout.split(' ') >> ~/.ssh/known_hosts", "delta": "0:00:00.002156", "end": "2017-05-31 15:17:17.397605", "failed": true, "item": "hosts.stdout.split(' ')", "rc": 2, "start": "2017-05-31 15:17:17.395449", "stderr": "/bin/sh: 1: Syntax error: \"(\" unexpected", "stderr_lines": ["/bin/sh: 1: Syntax error: \"(\" unexpected"], "stdout": "", "stdout_lines": []}
    to retry, use: --limit @/var/folders/4y/mqldj0_n2gq1mxwg2z61t5l00000gn/T/d20170531-9446-wdrskt/cumulus/cumulus/ansible/tasks/playbooks/gridengine/site.retry

The final message I got:

PLAY RECAP *********************************************************************
hpccloud-vm                : ok=29   changed=28   unreachable=0    failed=1   

Ansible failed to complete successfully. Any error output should be
visible above. Please fix these errors and try again.

I am not familiar with HPCCloud deployment yet so any advise would be appreciated.

I am using:

Thanks for your help !

cjh1 commented 7 years ago

The ones with '...ignoring' are expected and errors are being ignored. I will take a look at the last one ...

cjh1 commented 7 years ago

@felixveysseyre This was an issue with ansible 2.3 ( haven't really tested to much with it, thanks for testing :smile: ), I have merged the following fix Kitware/cumulus#314

felixveysseyre commented 7 years ago

Thanks @cjh1,

After re-running the virtual machine creation, everything was fine this time.