galaxyproject / planemo

Command-line utilities to assist in developing Galaxy and Common Workflow Language artifacts - including tools, workflows, and training materials.
https://planemo.readthedocs.io/
MIT License
89 stars 85 forks source link

Command to convert tool_dependencies.xml recipes into shell scripts #303

Closed peterjc closed 9 years ago

peterjc commented 9 years ago

Related to #19 (testing tool_dependencies.xml without a tool shed), I would like to be able to run an install recipe from a tool_dependencies.xml file locally and/or turn it into a simple shell script for the current platform.

(The platform specific actions could be turned into bash if statements if preferred)

This seems to overlap with https://github.com/jmchilton/shed2tap

e.g. https://github.com/peterjc/pico_galaxy/blob/master/tools/effectiveT3/tool_dependencies.xml

<?xml version="1.0"?>
<tool_dependency>
    <package name="effectiveT3" version="1.0.1">
        <install version="1.0">
            <actions>
                <!-- Set environment variable so Python script knows where to look -->
                <action type="set_environment">
                    <environment_variable name="EFFECTIVET3" action="set_to">$INSTALL_DIR</environment_variable>
                </action>
                <!-- Main JAR file -->
                <action type="shell_command">wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_GUI-1.0.1.jar</action>
                <!-- If using action type download_file will need to move the file,
                <action type="move_file"><source>TTSS_GUI-1.0.1.jar</source><destination>$INSTALL_DIR/</destination></action>
                -->
                <!-- Three model JAR files -->
                <action type="make_directory">$INSTALL_DIR/module</action>
                <action type="shell_command">wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_ANIMAL-1.0.1.jar</action>
                <action type="move_file"><source>TTSS_ANIMAL-1.0.1.jar</source><destination>$INSTALL_DIR/module/</destination></action>        
                <action type="shell_command">wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_PLANT-1.0.1.jar</action>
                <action type="move_file"><source>TTSS_PLANT-1.0.1.jar</source><destination>$INSTALL_DIR/module/</destination></action>
                <action type="shell_command">wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-1.0.1.jar</action>
                <action type="move_file"><source>TTSS_STD-1.0.1.jar</source><destination>$INSTALL_DIR/module/</destination></action>
                <action type="shell_command">wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-2.0.1.jar</action>
                <action type="move_file"><source>TTSS_STD-2.0.1.jar</source><destination>$INSTALL_DIR/module/</destination></action>
            </actions>
        </install>
        <readme>
Downloads effectiveT3 v1.0.1 and the three models from http://effectors.org/ aka http://effectors.csb.univie.ac.at/
        </readme>
    </package>
</tool_dependency>

Would become something like this (assuming already in install directory as per XML convention):

#!/bin/bash
#House keeping: strict bash mode, etc
set -euo pipefail
export INSTALL_DIR=$PWD
#Start of conversion from XML recipe:
echo "Installing effectiveT3 version 1.0.1"
export EFFECTIVET3=$INSTALL_DIR
wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_GUI-1.0.1.jar
mkdir $INSTALL_DIR/module
wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_ANIMAL-1.0.1.jar
mv TTSS_ANIMAL-1.0.1.jar $INSTALL_DIR/module/
wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_PLANT-1.0.1.jar
mv TTSS_PLANT-1.0.1.jar $INSTALL_DIR/module/
http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-1.0.1.jar
mv TTSS_STD-1.0.1.jar $INSTALL_DIR/module/
wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-2.0.1.jar
mv TTSS_STD-2.0.1.jar $INSTALL_DIR/module/

I would then be able to run this within TravisCI with the advantages that the install recipe is not repeated (tool_dependencies.xml and .travis.yml) and moreover I would actually be able to test tool_dependencies.xml, e.g. https://github.com/peterjc/pico_galaxy/commit/243311cc50ab2675c5e6aa42524841e60e6602e8

hexylena commented 9 years ago

+1

fre. 18. sep. 2015, 13.39 skrev Peter Cock notifications@github.com:

Related to #19 https://github.com/galaxyproject/planemo/issues/19 (testing tool_dependencies.xml without a tool shed), I would like to be able to run an install recipe from a tool_dependencies.xml file locally and/or turn it into a simple shell script for the current platform.

(The platform specific actions could be turned into bash if statements if preferred)

This seems to overlap with https://github.com/jmchilton/shed2tap

e.g. https://github.com/peterjc/pico_galaxy/blob/master/tools/effectiveT3/tool_dependencies.xml

<?xml version="1.0"?>

$INSTALL_DIR wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_GUI-1.0.1.jar $INSTALL_DIR/module wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_ANIMAL-1.0.1.jar TTSS_ANIMAL-1.0.1.jar$INSTALL_DIR/module/ wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_PLANT-1.0.1.jar TTSS_PLANT-1.0.1.jar$INSTALL_DIR/module/ wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-1.0.1.jar TTSS_STD-1.0.1.jar$INSTALL_DIR/module/ wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-2.0.1.jar TTSS_STD-2.0.1.jar$INSTALL_DIR/module/ Downloads effectiveT3 v1.0.1 and the three models from http://effectors.org/ aka http://effectors.csb.univie.ac.at/

Would become something like this:

!/bin/bash

House keeping: strict bash mode, etc

set -euo pipefail

TODO - move to a temp dir, check $INSTALL_DIR is set and exists

Start of conversion from XML recipe:

echo "Installing effectiveT3 version 1.0.1" wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_GUI-1.0.1.jar mv TTSS_GUI-1.0.1.jar $INSTALL_DIR/ mkdir $INSTALL_DIR/module wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_ANIMAL-1.0.1.jar mv TTSS_ANIMAL-1.0.1.jar $INSTALL_DIR/module/ wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_PLANT-1.0.1.jar mv TTSS_PLANT-1.0.1.jar $INSTALL_DIR/module/http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-1.0.1.jar mv TTSS_STD-1.0.1.jar $INSTALL_DIR/module/ wget http://effectors.csb.univie.ac.at/sites/eff/files/others/TTSS_STD-2.0.1.jar mv TTSS_STD-2.0.1.jar $INSTALL_DIR/module/

I would then be able to run this within TravisCI with the advantages that the install recipe is not repeated (tool_dependencies.xml and .travis.yml) and moreover I would actually be able to test tool_dependencies.xml, e.g. peterjc/pico_galaxy@243311c https://github.com/peterjc/pico_galaxy/commit/243311cc50ab2675c5e6aa42524841e60e6602e8

— Reply to this email directly or view it on GitHub https://github.com/galaxyproject/planemo/issues/303.

hexylena commented 9 years ago

Alternatively there is the shed2tap code if installing from a brew recipe would be acceptable

bgruening commented 9 years ago

ping @davebx; As far as I know he was looking at this already. We had this idea some month ago to make migration to brew or whatever we will use easier.

peterjc commented 9 years ago

I'm out of time now, but having spent some time this afternoon hacking https://github.com/jmchilton/shed2tap I think I can turn @jmchilton's Action.to_ruby() method into something to produce a bash script.

That might be enough for a stand alone tool, or a new planemo command - but waiting to hear from @davebx etc about how best to proceed to avoid duplication of effort.

jmchilton commented 9 years ago

Just a heads up (maybe way to late), the newest shed2tap code is actually in planemo itself. https://github.com/galaxyproject/planemo/blob/master/planemo/shed2tap/base.py.

peterjc commented 9 years ago

Thanks @jmchilton. https://github.com/galaxyproject/planemo/blob/master/planemo/shed2tap/base.py has an extensive to_ruby() method on the base Action class (essentially a large switch statement), but there is nothing similar on https://github.com/galaxyproject/planemo/blob/master/planemo/shed2tap/base.py which instead has a far more complete heirachy of Action subclasses. I would think adding small to_ruby() or to_bash() methods to each Action subclasses would make sense here?

e.g. https://github.com/peterjc/planemo/tree/shed2bash

jmchilton commented 9 years ago

This might be the most updated thing I was working on... https://github.com/jmchilton/planemo/commit/52f78665d7c2eece73fdcce60a9294638856bf86. https://github.com/jmchilton/planemo/commits/shed2tap

Whatever you get working is fine. My code is sprawled all over it seems and that is my own fault so I will adapt it to whatever you get into planemo :).

jmchilton commented 9 years ago

I was thinking implementing a visitor pattern for ruby/bash conversion - but to_bash or to_ruby will be find also.

peterjc commented 9 years ago

My first attempt is using to_bash on the action classes...

peterjc commented 9 years ago

My first example for effectiveT3 seems to work - but that is a simple tool_dependencies.xml file, perhaps unusually simple.

The action type download_by_url and friends is proving tricky. The problem is the Galaxy magic in lib/tool_shed/galaxy_install/tool_dependencies/recipe/step_handler.py class CompressedFile where the .extract method will work out the common prefix of a tar-bar's contents in order to change into that directory. e.g.

<action type="download_by_url">ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.30/ncbi-blast-2.2.30+-x64-linux.tar.gz</action>

should become:

$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.30/ncbi-blast-2.2.30+-x64-linux.tar.gz
$ tar -zxvf ncbi-blast-2.2.30+-x64-linux.tar.gz
$ cd ncbi-blast-2.2.30+

I'm almost wondering if something like this would be simplest (which can call the same Galaxy code):

$ planemo download_by_url ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.30/ncbi-blast-2.2.30+-x64-linux.tar.gz
peterjc commented 9 years ago

Another niggle, consider a recipe which boils down to something like this:

#!/bin/bash
#downloads stuff, then sets a new environment variable like NEW_TOOL,
#or edits an exiting environment variable like PATH
export NEW_TOOL=/some/path

If executed directly like ./example.sh or bash example.sh then we loose access to the environment variable $NEW_TOOL. Alternatively source example.sh or . example.sh work in terms of exposing the new/changed environment variables, but they make exit (or failures if using strict bash mode with set -euo pipefail or similar) terminate the user's shell session.

I think this means we need to turn the tool_dependencies.xml file(s) into an install.sh file (or similarly named file) to be run once, plus a second shell script which only sets the environment variables, to be run via source prior to running the tool tests via the dependency mechanisms. Galaxy calls those env.sh, doesn't it?

bgruening commented 9 years ago

@peterjc I like your second idea. And yes Galaxy calls the env.sh files before executing a tool. Thanks for working on this. I think this will make so much things easier.

peterjc commented 9 years ago

I'm increasingly finding I am reimplementing things already in the Galaxy Tool Shed code (with the risk of potentially interpreting the XML recipe slightly differently, which on the plus side could highlight some ambiguities in the recipe format).

e.g. turning the environment variable actions into env.sh entries is done in https://github.com/galaxyproject/galaxy/blob/dev/lib/tool_shed/galaxy_install/tool_dependencies/recipe/env_file_builder.py

Planemo already bundles part of the Galaxy python library under planemo_ext/ so might adding planemo_ext/tool_shed/galaxy_install/tool_dependencies/recipe/env_file_builder.py etc might be a practical way forward?

jmchilton commented 9 years ago

@peterjc That directory aims to be a subset of the Galaxy's code base, feel free to bring stuff over. The stuff should be sufficiently isolated though. galaxy.util also isn't yet a true subset so be careful about that as well.

peterjc commented 9 years ago

I have a plan for the action type download_by_url etc actions consistent with producing as simple as possible a bash script.

While generating the bash script, I will download the file (to a temp directory by default) where I can examine it to determine how to decompress it and what folder (if any) Galaxy would automatically change into. This might use a bundled copy of lib/tool_shed/galaxy_install/tool_dependencies/recipe/step_handler.py.

To avoid the overheads and waste of repeated downloads, the key information (decompression method and folder to change into) can be cached. I am planning to use the MD5 hash of the URL as the key. e.g. "~/.planemo/dependency_downloads/%s.json" % md5(url)

In the context of continuous integration with TravisCI, I plan to re-use the cached downloaded files. i.e. include an if statement to link to the cached file if present.

peterjc commented 9 years ago

I have a working prototype planemo depbash command here: https://github.com/peterjc/planemo/tree/depbash

This is hard coded to produce a single file dep_install.sh and matching env.sh combining all the tool_dependencies.xml files processed (you can recurse over a folder) which all will use $INSTALL_DIR as their destination (which is a problem if you have name clashes between tool binaries, e.g. multiple versions of BLAST+).

Right now it uses a single flat folder $DOWNLOAD_CACHE (defaulting to ./download_cache) to cache downloads (nothing clever with checksums), so that the decompression and folder structure can be determined while generating the shell script. The dep_install.sh will also use this cache so that in a continuous integration setup the file is only fetched once.

Example usage assuming you don't have to worry about multiple tools clashing:

$ planemo depbash -r ~/my_tools/
$ bash dep_install.sh
$ source env.sh
$ planemo test -r ~/my_tools

Note this does nothing about resolving dependencies!

This is able to parse all my tool_dependencies.xml in https://github.com/peterjc/galaxy_blast , https://github.com/peterjc/pico_galaxy and https://github.com/peterjc/galaxy_mira

Not all the actions are supported yet, e.g.

$ planemo depbash --fail_fast -r ../tools-iuc ../tools-devteam/ ; echo "Returned $?"
...
Processing requirements from /mnt/galaxy/repositories/tools-iuc/packages/package_abyss_1_9_0/tool_dependencies.xml
Downloading https://github.com/bcgsc/abyss/releases/download/1.9.0/abyss-1.9.0.tar.gz
Error processing /mnt/galaxy/repositories/tools-iuc/packages/package_abyss_1_9_0/tool_dependencies.xml - No to_bash defined for Action[type=set_environment_for_install]
...
Error processing one or more tool_dependencies.xml files.
Returned 1
hexylena commented 9 years ago

@peterjc awesome!

peterjc commented 9 years ago

Successful TravisCI usage with galaxy_mira to install MIRA 3.4.1.1, 4.0.2 and 4.9.5 via planemo depbash rather than a manual install recipe:

https://github.com/peterjc/galaxy_mira/commit/b71e8a49cf06c173916f408f2077a59ad7b003c5 https://travis-ci.org/peterjc/galaxy_mira/builds/82117820

In the above TravisCI ran no tests as nothing had changed compared to the Test Tool Shed. Here's the following test run here where I requested all the tests be run (magic keyword in the git commit):

https://github.com/peterjc/galaxy_mira/commit/70cad4e64a7f38bf6bfc3177bedafd504719b083 https://travis-ci.org/peterjc/galaxy_mira/builds/82123804

See also #7 where I described the planemo + TravisCI approach I'm trying on this galaxy_mira branch.

peterjc commented 9 years ago

Should we leave this open for finishing some of the missing functionality as of https://github.com/galaxyproject/planemo/commit/f798c7e29b2276ce68b828e72fc6a6460c73792b or file separate issues?

jmchilton commented 9 years ago

Whichever you'd prefer, but my vote is for new issues, I like churn :).

peterjc commented 9 years ago

OK. I've filed issues for what I consider to the top priorities.

peterjc commented 9 years ago

Quoting myself from earlier in this discussion: I'm increasingly finding I am reimplementing things already in the Galaxy Tool Shed code (with the risk of potentially interpreting the XML recipe slightly differently, which on the plus side could highlight some ambiguities in the recipe format).

Here's an example of the kind of ambiguity I was expecting: https://github.com/galaxyproject/planemo/pull/321 and https://github.com/galaxyproject/galaxy/issues/896