galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

Enable Conda Install-on-Demand by Default #2446

Closed jmchilton closed 8 years ago

jmchilton commented 8 years ago

I was thinking the other day a few people are rushing forward so heavily with conda that maybe we should make install-on-demand the new default on option - then if the tool shed is enabled for instance - conda packages should just work out-of-the-box without extra configuration. This lessens the need to add it to the GUI - which I have not seen any progress on I don't think.

This commit in particular (https://github.com/bgruening/galaxytools/pull/350#discussion_r65540533) crystallizes the utility of this. @bgruening shouldn't have to maintain shed dependencies but if he doesn't it is an inconvenience (and in this case a potential security problem?) to end user.

I'd rather these actions be requested in the abstract - the tool running code shouldn't require an Internet connection shouldn't modify dependencies without being prompted ideally... but pragmatically speaking I think the tide has shifted and the partially ready to go nature of conda dependency resolution is causing too much confusion and inconvenience.

Any objections or assents?

bgruening commented 8 years ago

:+1: from me, obviously. I wanted to advertise this move on GCC but as earlier we discuss this as better :) Moving usegalaxy.org to enable conda would be one requirement I guess and having some initial UI support would be great. For example with this https://github.com/galaxyproject/galaxy/pull/2395 it could be easy to check if all dependencies are available (in conda) during tool installation or inspection. Also we should warn users if there is no tool_dependencies.xml in a repository and point to documentation about enabling other dependency resolution systems.

martenson commented 8 years ago

@jmchilton Could you please explain a bit more to us non-Conda folk what is install-on-demand the new default on option.

jmchilton commented 8 years ago

Currently - conda will use recipes that have been installed using the dependency resolution code in Galaxy - there is an API endpoint and library functionality reused by planemo to perform these installations. The install-on-demand stuff is just the functionality in Galaxy that will do this installation right before the tool is executed - this is how @bgruening and others have been using conda I believe. From a usability perspective the problem with this is that if something goes wrong - it isn't displayed in the UI anywhere I don't think - this is not a admin panel or anything like that.

martenson commented 8 years ago

So you hope Conda installation on demand does not fail and if it does the error would be a part of the dataset bug report so user can give us feedback?

natefoo commented 8 years ago

Can we at least have something at tool-install-time that says "these conda packages (and their dependencies) will be installed from conda the first time the tool is run"?

natefoo commented 8 years ago

Also, one benefit of dependency installation this way is that finally you can install dependencies correctly for differing target destinations (e.g. in our case Stampede vs. local cluster vs. Jetstream).

One downside is that the dependencies have to be writable on the cluster, by the user running on the cluster. I have recently moved all of usegalaxy.org's dependencies into CVMFS, which is read-only except on one node, and on that one node you have to start a transaction on the filesystem before you can write into it (I make it possible to install tools/deps under this system by starting a transaction and then starting a docker container that runs the Test or Main "Installer" server).

bgruening commented 8 years ago

I would like to second @natefoo comment. TS install hints would be great. At some point it would be nice if pulsar installs dependencies with conda. This would even enable installations on windows nodes for proteomics :)

pvanheus commented 8 years ago

One issue with install-on-demand is that the job remains in a grey state for an extended period of time, which is surprising to those expecting quick submit -> grey -> yellow transition.

bgruening commented 8 years ago

@pvanheus this happens only once, isnt'?

jmchilton commented 8 years ago

Closing this. https://github.com/galaxyproject/galaxy/pull/2554 is a much better remedy to the unease I felt over this topic in June - kudos to @mvdbeek and @bwlang and their marvelous work at the hackathon.