ARTbio / GalaxyKickStart

Ansible playbooks for Galaxy Server deployment
GNU General Public License v3.0
24 stars 22 forks source link

Reduce artimed extras #172

Closed mvdbeek closed 8 years ago

mvdbeek commented 8 years ago

Summary of changes:

drosofff commented 8 years ago

We should rename the artimed_extras role to galaxykickstart

mvdbeek commented 8 years ago

We should rename the artimed_extras role to galaxykickstart

I think we should remove it completely and move the data managers part to a new data managers role (If we decide we want to keep this functionality)

drosofff commented 8 years ago

I am ok with complete removing and a data_managers role.

mvdbeek commented 8 years ago

I propose to create a script folder at the root, with the install_tool_shed_tools.py, generate_tool_list_from_ga_workflow_files.py and other scripts to come (I will work at a script that creates a group_vars, inventory, etc, from a workflow and/or a tool list

Can you outline this in an issue? I am looking at this as well.

mvdbeek commented 8 years ago

@drosofff this is ready for review!

drosofff commented 8 years ago

OK, look nice 👍 I would like to run a couple of installations with the reorganized roles before merging

drosofff commented 8 years ago

I have this error with the ansible-playbook run on the branch

TASK [galaxyprojectdotorg.galaxy-tools : Install Tool Shed tools] **************
failed: [localhost] => (item=extra-files/artimed/artimed_tool_list.yml) => {"changed": true, "cmd": ["/tmp/venv/bin/python", "install_tool_shed_tools.py", "-t", "artimed_tool_list.yml", "-a", "admin", "-g", "localhost"], "delta": "0:00:00.182402", "end": "2016-06-19 14:49:10.523772", "failed": true, "item": "extra-files/artimed/artimed_tool_list.yml", "rc": 1, "start": "2016-06-19 14:49:10.341370", "stderr": "Traceback (most recent call last):\n  File \"install_tool_shed_tools.py\", line 590, in <module>\n    install_tools(options)\n  File \"install_tool_shed_tools.py\", line 471, in install_tools\n    itl = installed_tool_revisions(gi)  # installed tools list\n  File \"install_tool_shed_tools.py\", line 170, in installed_tool_revisions\n    itl = tsc.get_repositories()\n  File \"/tmp/venv/local/lib/python2.7/site-packages/bioblend/galaxy/toolshed/__init__.py\", line 36, in get_repositories\n    return Client._get(self)\n  File \"/tmp/venv/local/lib/python2.7/site-packages/bioblend/galaxy/client.py\", line 147, in _get\n    raise ConnectionError(msg)\nbioblend.galaxy.client.ConnectionError: GET: error 403: '{\"err_msg\": \"Provided API key is not valid.\", \"err_code\": 403001}', 0 attempts left: None", "stdout": "", "stdout_lines": [], "warnings": []}

Apparently an API key issue.

However I had to sync / update the submodules that have been changed in the branch and do not guarantee that this is not the problem: from my git session:

From https://github.com/mvdbeek/ansible-galaxy-tools
 * [new branch]      install_individual_tools -> origin/install_individual_tools
 + 036bb22...c259caa master     -> origin/master  (forced update)
 * [new branch]      predefined_api_key -> origin/predefined_api_key
 * [new branch]      timeout    -> origin/timeout
Submodule path 'roles/galaxyprojectdotorg.galaxy-tools': checked out 'c259caa75621dadea8280dfa7d06db9df1c122bd'
mvdbeek commented 8 years ago

Submodule path 'roles/galaxyprojectdotorg.galaxy-tools': checked out 'c259caa75621dadea8280dfa7d06db9df1c122bd'

Yep, this should be 188e7cd136052f1e00efa3d19ffbcd9fe8f29dd5. In the ansible-artimed repo try a git submodule sync && git submodule update.

drosofff commented 8 years ago

OK, a manually fetched and checked out the predefined_api_key branch from your repo and it works. So almost ok for merge (Need another install) but double check the submodule updates. to answer your comment that just arrived while I write this post, I did a submodule sync && git submodule update. but I had to fetch and checkout manually for your galaxy-tools branch

mvdbeek commented 8 years ago

I did a submodule sync && git submodule update. but I had to fetch and checkout manually for your galaxy-tools branch

Hmm yes, this is very annoying. I think the problem is that I'm indeed using a branch, not master in my fork (https://stackoverflow.com/questions/1777854/git-submodules-specify-a-branch-tag). I guess if things are meant to go into the ansible-artimed master branch, we have to keep the submodule changes in master. :/ I'll make that change (and remove some supervisor variables from the data_managers role)

mvdbeek commented 8 years ago

Okay, i've pushed the predefined_api_key branch to the master branch of my tools role fork, this should now work with git submodule sync && git submodule update

drosofff commented 8 years ago

not tested yet your master branch, but in the meantime:

TASK [data_managers : Run data managers] ***************************************
failed: [localhost] => (item=extra-files/artimed/artimed_data_manager_tasks.yml) => {"changed": true, "cmd": ["/tmp/venv/bin/python", "install_tool_shed_tools.py", "-d", "extra-files/artimed/artimed_data_manager_tasks.yml", "-a", "admin", "-g", "localhost"], "delta": "0:00:00.121882", "end": "2016-06-19 15:58:43.551298", "failed": true, "item": "extra-files/artimed/artimed_data_manager_tasks.yml", "rc": 1, "start": "2016-06-19 15:58:43.429416", "stderr": "Traceback (most recent call last):\n  File \"install_tool_shed_tools.py\", line 654, in <module>\n    run_data_managers(options)\n  File \"install_tool_shed_tools.py\", line 415, in run_data_managers\n    kl = load_input_file(dbkeys_list_file)  # Input file contents\n  File \"install_tool_shed_tools.py\", line 126, in load_input_file\n    with open(tool_list_file, 'r') as f:\nIOError: [Errno 2] No such file or directory: 'extra-files/artimed/artimed_data_manager_tasks.yml'", "stdout": "", "stdout_lines": [], "warnings": []}

an issue with extra-files/artimed/artimed_data_manager_tasks.yml

drosofff commented 8 years ago

ok, the git submodule sync && git submodule update works

but there is also this:

TASK [galaxyprojectdotorg.galaxy-tools : Remove tool management script] ********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "gid": 0, "group": "root", "mode": "0644", "msg": "unlinking failed: [Errno 1] Operation not permitted: '/tmp/install_tool_shed_tools.py' ", "owner": "root", "path": "/tmp/install_tool_shed_tools.py", "size": 28603, "state": "file", "uid": 0}

RUNNING HANDLER [galaxyprojectdotorg.galaxy : restart galaxy] ******************

RUNNING HANDLER [galaxyprojectdotorg.galaxy : email administrator with changeset id] ***

PLAY RECAP *********************************************************************
localhost                  : ok=144  changed=32   unreachable=0    failed=1
mvdbeek commented 8 years ago

@drosofff how are you testing those last two things?

TASK [galaxyprojectdotorg.galaxy-tools : Remove tool management script] ********
fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "gid": 0, "group": "root", "mode": "0644", "msg": "unlinking failed: [Errno 1] Operation not permitted: '/tmp/install_tool_shed_tools.py' ", "owner": "root", "path": "/tmp/install_tool_shed_tools.py", "size": 28603, "state": "file", "uid": 0}

Smells a bit like an artefact of the artimed_extras role, which didn't remove that particular script (also it should have never been created with root owner!). If you remove this file by hand, does the the play finish?

drosofff commented 8 years ago

@mvdbeek please test vagrant up, too.

drosofff commented 8 years ago

'cause:

TASK [galaxyprojectdotorg.galaxy-os : Remove old Docker folder] ****************
task path: /Users/aligre/ansible-artimed/roles/galaxyprojectdotorg.galaxy-os/tasks/docker.yml:26
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/Users/aligre/ansible-artimed/.vagrant/machines/default/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=30 -o ControlPath=/Users/aligre/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 '/bin/sh -c '"'"'( umask 22 && mkdir -p "` echo $HOME/.ansible/tmp/ansible-tmp-1466358355.18-159305381113967 `" && echo "` echo $HOME/.ansible/tmp/ansible-tmp-1466358355.18-159305381113967 `" )'"'"''
<127.0.0.1> PUT /var/folders/4m/m40jj39m8xj3470059b6gg_h0000gp/T/tmpv9JTJv TO /home/vagrant/.ansible/tmp/ansible-tmp-1466358355.18-159305381113967/file
<127.0.0.1> SSH: EXEC sftp -b - -C -vvv -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/Users/aligre/ansible-artimed/.vagrant/machines/default/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=30 -o ControlPath=/Users/aligre/.ansible/cp/ansible-ssh-%h-%p-%r '[127.0.0.1]'
<127.0.0.1> ESTABLISH SSH CONNECTION FOR USER: vagrant
<127.0.0.1> SSH: EXEC ssh -C -vvv -o UserKnownHostsFile=/dev/null -o IdentitiesOnly=yes -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o Port=2222 -o 'IdentityFile="/Users/aligre/ansible-artimed/.vagrant/machines/default/virtualbox/private_key"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=vagrant -o ConnectTimeout=30 -o ControlPath=/Users/aligre/.ansible/cp/ansible-ssh-%h-%p-%r -tt 127.0.0.1 '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-rqcnelonhrlhoewwfpaceizxzzyzbpbd; /bin/sh -c '"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'LANG=fr_FR.UTF-8 LC_ALL=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8 /usr/bin/python /home/vagrant/.ansible/tmp/ansible-tmp-1466358355.18-159305381113967/file; rm -rf "/home/vagrant/.ansible/tmp/ansible-tmp-1466358355.18-159305381113967/" > /dev/null 2>&1'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"'"''"'"'"'"'"'"'"'"''"'"''
fatal: [default]: FAILED! => {"changed": false, "failed": true, "invocation": {"module_args": {"backup": null, "content": null, "delimiter": null, "diff_peek": null, "directory_mode": null, "follow": false, "force": false, "group": null, "mode": null, "original_basename": null, "owner": null, "path": "/var/lib/docker", "recurse": false, "regexp": null, "remote_src": null, "selevel": null, "serole": null, "setype": null, "seuser": null, "src": null, "state": "absent", "validate": null}, "module_name": "file"}, "msg": "rmtree failed: [Errno 16] Device or resource busy: '/var/lib/docker/devicemapper'"}
    to retry, use: --limit @galaxy.retry

PLAY RECAP *********************************************************************
default                    : ok=23   changed=17   unreachable=0    failed=1

under vagrant up

mvdbeek commented 8 years ago

@mvdbeek please test vagrant up, too.

Works for me, but I ran this on a new machine. I suspect the problem comes from updating docker.

"rmtree failed: [Errno 16] Device or resource busy: '/var/lib/docker/devicemapper'"}
    to retry, use: --limit @galaxy.retry

This is a lockup, can you do a supervisorctl stop docker and see if it passes through?

drosofff commented 8 years ago
TASK [data_managers : Run data managers] ***************************************
failed: [localhost] => (item=extra-files/artimed/artimed_data_manager_tasks.yml) => {"changed": true, "cmd": ["/tmp/venv/bin/python", "install_tool_shed_tools.py", "-d", "extra-files/artimed/artimed_data_manager_tasks.yml", "-a", "admin", "-g", "localhost"], "delta": "0:00:00.116701", "end": "2016-06-19 19:10:39.312228", "failed": true, "item": "extra-files/artimed/artimed_data_manager_tasks.yml", "rc": 1, "start": "2016-06-19 19:10:39.195527", "stderr": "Traceback (most recent call last):\n  File \"install_tool_shed_tools.py\", line 654, in <module>\n    run_data_managers(options)\n  File \"install_tool_shed_tools.py\", line 415, in run_data_managers\n    kl = load_input_file(dbkeys_list_file)  # Input file contents\n  File \"install_tool_shed_tools.py\", line 126, in load_input_file\n    with open(tool_list_file, 'r') as f:\nIOError: [Errno 2] No such file or directory: 'extra-files/artimed/artimed_data_manager_tasks.yml'", "stdout": "", "stdout_lines": [], "warnings": []}

RUNNING HANDLER [galaxyprojectdotorg.galaxy : restart galaxy] ******************

RUNNING HANDLER [galaxyprojectdotorg.galaxy : email administrator with changeset id] ***

PLAY RECAP *********************************************************************
localhost                  : ok=156  changed=92   unreachable=0    failed=1
mvdbeek commented 8 years ago

@drosofff This should work now (and has been broken ever since we moved these to extra-files, if this has ever worked). The reason we notice this now is that I have removed all these run_* (like run_data_managers) variables.

drosofff commented 8 years ago

It is not:

commit 5822fbc5a645bb0a4310d1051c540955b1dd29e8
Author: Marius van den Beek <m.vandenbeek@gmail.com>
Date:   Mon Jun 20 09:17:31 2016 +0200

    When copying task lists, only pass basename to install_tool_shed_tools.py

ansible-playbook -i inventory_files/artimed galaxy.yml on a fresh IFB instance

TASK [data_managers : Remove data manager task file] ***************************
failed: [localhost] => (item=extra-files/artimed/artimed_data_manager_tasks.yml) => {"failed": true, "item": "extra-files/artimed/artimed_data_manager_tasks.yml", "msg": "rmtree failed: [Errno 2] No such file or directory: '/tmp/ccRRi7Mf.s'"}

RUNNING HANDLER [galaxyprojectdotorg.galaxy : restart galaxy] ******************

RUNNING HANDLER [galaxyprojectdotorg.galaxy : email administrator with changeset id] ***

PLAY RECAP *********************************************************************
localhost                  : ok=158  changed=94   unreachable=0    failed=1

Please test your commits yourself cause I have no time to do it anymore this week

mvdbeek commented 8 years ago

Please test your commits yourself cause I have no time to do it anymore this week

I have tested this in vagrant, where it works. If you don't have time to test it this week then don't test it, I will move forward.

drosofff commented 8 years ago

Coming back to this PR after numerous testing and restesting in vagrant, IFB cloud, AWS cloud...

first issue

How this branch currently diverge from the gcc2016 branch ? are the modifications in Vagrantfile the only changes ? If yes, the question is shall we merge this branch, or the gcc2016 ?

second issue

I think that the role ansible-galaxy-tools/tasks/main.yml should contain additional code such as

- include: restart_galaxy.yml
  when: galaxy_tools_install_tools  # this condition is even optional in my opinion

otherwise, the new playbook implies that you have to restart manually Galaxy which is a regression from the current master. I understand that this comes from a notify statement from another role, whose log should be also removed if we restart Galaxy in our playbook. I have tested this additional code and it seems to work.

Third (most important) issue

There is a complex issue (at least for me) with the /tmp directory. Probably with the rights of /tmp but could be also its deletion... or its non-deletion. The facts are that this /tmp is implied in various errors when you play or replay the playbook with different inventory files.

Here is an exemple:

TASK [galaxyprojectdotorg.galaxy-tools : Create Galaxy bootstrap user] *********
fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["/home/galaxy/galaxy/.venv/bin/python", "manage_bootstrap_user.py", "-c", "/home/galaxy/galaxy/config/galaxy.ini", "create", "-e", "admin@galaxy.org", "-u", "cloud", "-p", "admin", "-a", "admin"], "delta": "0:00:02.506624", "end": "2016-06-29 16:51:12.106424", "failed": true, "rc": 1, "start": "2016-06-29 16:51:09.599800", "stderr": "Traceback (most recent call last):\n  File \"manage_bootstrap_user.py\", line 230, in <module>\n    log = _setup_global_logger()\n  File \"manage_bootstrap_user.py\", line 86, in _setup_global_logger\n    file_handler = logging.FileHandler('/tmp/galaxy_tools_bootstrap_user.log')\n  File \"/usr/lib/python2.7/logging/__init__.py\", line 903, in __init__\n    StreamHandler.__init__(self, self._open())\n  File \"/usr/lib/python2.7/logging/__init__.py\", line 928, in _open\n    stream = open(self.baseFilename, self.mode)\nIOError: [Errno 13] Permission denied: '/tmp/galaxy_tools_bootstrap_user.log'", "stdout": "", "stdout_lines": [], "warnings": []}
    to retry, use: --limit @galaxy.retry

PLAY RECAP *********************************************************************
localhost                  : ok=120  changed=18   unreachable=0    failed=1

But it can also happen that the restarting of supervisorctl (galaxy:uwsgi) fails due to the absence of this /tmp (probably deleted in a previous playbook round). You can restart just by mkdir /tmp && chmod 777 /tmp

And last, but not least, I finally figured out why the installation of deseq2 package systematically fails with the new simplified playbook: This is precisely the absence of the /tmp and/or too restricted access rights if it already exists. From the galaxy admin panel, the repair of this tool (which include reinstalling libxml) won't work until you manually mkdir /tmp && chmod 777 /tmp

In summary, the behavior of the /tmp file along the playbook run is not clear to me because I understand that it can be manipulated by several submodules, including our galaxy-tools submodule. But I feel this important /tmp feature is still a bit floppy (not well automated yet).

4th issue

the data_managers role is not crystal clear too me. Is it really an important feature or just a rest from the previous playbook ?


Finally, I would really like much to merge this PR (or a PR from gcc2016 if equivalent) with the master, to move forward. But avoiding regression in the automation. As Bjorn said, we are working for usability not for geeks.

mvdbeek commented 8 years ago

first issue

How this branch currently diverge from the gcc2016 branch ? are the modifications in Vagrantfile the only changes ? If yes, the question is shall we merge this branch, or the gcc2016 ?

Yes, those are the only changes ... I wanted to demo ansible without the automatic provisioning that vagrant up does. I would prefer to merge the gcc2016 branch, but ultimately i don't think this is important.

second issue

I think that the role ansible-galaxy-tools/tasks/main.yml should contain additional code such as

  • include: restart_galaxy.yml when: galaxy_tools_install_tools # this condition is even optional in my opinion otherwise, the new playbook implies that you have to restart manually Galaxy which is a regression from the current master. I understand that this comes from a notify statement from another role, whose log should be also removed if we restart Galaxy in our playbook. I have tested this additional code and it seems to work.

I am intentionally removing these things, as they should be done once and once only when the play has finished, from inside the play, not the role (The role should only notify of a necessary restart, while the play implements the restart. I'll add this before merging the PR). All this restarting unnecessarily slows down the playbook, which in return limits the amount of testing we can do in travis.

For the third issue, it comes down to https://github.com/ARTbio/ansible-artimed/pull/172#discussion_r69392748 , which should solve most of these problems. The undelrying problem is that the tool installation script is copied and removed, which doesn't really make sense. It should become part of ephemeris, and then we just install ephemeris.