ARTbio / GalaxyKickStart

Ansible playbooks for Galaxy Server deployment
GNU General Public License v3.0
24 stars 22 forks source link

Installation of tools with conda dependencies #201

Closed drosofff closed 6 years ago

drosofff commented 7 years ago

From what I am currently testing, I think that when you run for the first time a playbook, the installation of tools with conda dependencies ends up with "dependency resolution" errors (red crosses in the admin panel). Then, if you remove the tool and reinstall it manually, it works. You can also remove the tool and re-play the playbook, it works too.

It has maybe something to do with specification of conda_auto_install, or with the miniconda role ? Not clear to me. Except that it precludes the build of docker image, isn't it ?

mvdbeek commented 7 years ago

Did you specify install_resolver_dependencies: True in the tool list? See https://github.com/ARTbio/GalaxyKickStart/compare/master...mvdbeek:example_PR for an example.

drosofff commented 7 years ago

Well I start to see. This should be a PR, shouldn't it ?

I also guess that this impacts on the script to recover tool list from galaxy instance. Is this script updated in the ephemeris repo ? Plus some ramification on my script workflow-to-toollist. In that case, I am sure that the code is not upgraded !

mvdbeek commented 7 years ago

Yes, I don't think this can be easily automated, since we have to decide on a tool by tool basis if it can be installed through conda. Alternatively we can introduce a switch (or default to) installing both conda and regular tool dependencies (since I think the right thing to do is to remove tool dependencies once conda dependencies are available).

drosofff commented 7 years ago

Let's discuss this in a PR, it would be better: You are proposing to code differently the tool list in order to adapt galaxykickstart to conda (which we must do now), but the reason for these additional code lines in the toollist.yml is not captured in the master repo, right ? I will do it in when PR for the branch https://github.com/ARTbio/GalaxyKickStart/tree/metavisitor_2016-11-10 that I created yesterday

drosofff commented 7 years ago

Yet.. I am continuing to think... What is puzzling me now is that, when I replay the playbook (having deleted conda tools from the admin tool panel), then the second installation works (still with incomplete code in the tool list). It's maybe a way to explore to turn around the necessity to know whether a tool is conda or not ? In a sense GalaxyKickStart seems to be able to do it the second time it runs.

mvdbeek commented 7 years ago

This is strange and should not happen. What exactly are you doing for this to happen? If you don't set install_resolver_dependencies, and you don't set conda_auto_install, you should not get conda dependencies installed.

drosofff commented 7 years ago

OK, this does not happen when you just rerun the playbook without setting install_resolver_dependencies. I probably mixed with a manual installation on the galaxy instance.

On the other hand, your answer may me asking something else: why not setting conda_auto_install to true ?

mvdbeek commented 7 years ago

Cause it will try installing the dependency every time the tool is run, which is slow (and if that dependency is not available it will continue trying every time) and you can only run conda once at the same time, so you lose concurrency.

drosofff commented 7 years ago

OK, I tested that adding

- install_resolver_dependencies: True
- install_tool_dependencies: False

fixes the issue.

Now @mvdbeek can you help me to clarify the process in various situations:

What happens

You can also answer by explaining in more detail how the variables install_resolver_dependencies and install_tool_dependencies control things. I guess it is somewhat related to conda role of the playbook, but maybe you can give some hints here.

Another aspect is that this new layer of specifications for the tools in the tool list makes the script https://github.com/galaxyproject/ephemeris/blob/master/ephemeris/get_tool_list_from_galaxy.py less "productive", since I understand that the list has to be adapted by hand on a tool by tool basis.

Is there a way to use the API in this script to determine whether a tool in a server has been "dependency-resolved" to conda or toolshed dependencies, which would allow to automatically set the install_resolver_dependencies and install_tool_dependencies variables in the script ?

mvdbeek commented 7 years ago

if install_resolver_dependencies: True && the tool has no conda dependencies ?

nothing, meaning you'll be missing a dependency

if install_tool_dependencies: True && the tool has only conda dependencies ?

same

You can also answer by explaining in more detail how the variables install_resolver_dependencies and install_tool_dependencies control things.

This maps 1:1 to galaxy's API and by extension to bioblend, install_resolver_dependencies controls if the new resolver system should attempt to install tool dependencies. install_tool_dependencies has always been there (along with install_repository_dependencies), and defaults to true.

Another aspect is that this new layer of specifications for the tools in the tool list makes the script https://github.com/galaxyproject/ephemeris/blob/master/ephemeris/get_tool_list_from_galaxy.py less "productive", since I understand that the list has to be adapted by hand on a tool by tool basis.

Yes, but I think we can default to install everything (by adapting ephemeris' shed_install, this is also what galaxy will be doing soon). Then if you know you want conda for a tool (for example to save diskspace and time), you can set install_tool_dependencies to false for that tool.

Is there a way to use the API in this script to determine whether a tool in a server has been "dependency-resolved" to conda or toolshed dependencies, which would allow to automatically set the install_resolver_dependencies and install_tool_dependencies variables in the script ?

I don't think we should go into that direction. A tool is a tool, that's the important thing. We shouldn't tell people what system to use to satisfy the dependency, since there is also docker, environment modules, brew etc. I think we should just default to installing everything.

drosofff commented 7 years ago

I would just default to installing everything.

So install_resolver_dependencies True and install_tool_dependencies not specified (default True) in the tool list would be OK ? (it is more or less the solution by @afgane for cloud_setup, right ?)

Is shed_install used by GalaxyKickStart roles ?

mvdbeek commented 7 years ago

So install_resolver_dependencies True and install_tool_dependencies not specified (default True) in the tool list would be OK ? (it is more or less the solution by @afgane for cloud_setup, right ?)

yes

Is shed_install used by GalaxyKickStart roles ?

Yes, this is used by ansible-galaxy-tools (and replaces the script that used to come with ansible-galaxy-tools).

The one thing I forgot to mention is that the miniconda role is exclusively installing conda, so that galaxy doesn't need to do this on first startup, it's not related to tool installations except that it provides conda for galaxy.