Open natefoo opened 6 years ago
Thanks for the cc @natefoo. I really like this idea and it's what we've been pushing for. We already have a CVMFS tier 1 server running with the ref data and so it wouldn't be too difficult to get this running too. As for synchronisation of the integrated_tool_panel.xml
file, could the one time manual work on this could happen at GCC this year (when we're all in the same room?)
We already have a CVMFS tier 1 server running with the ref data
This is great! Is this just for internal consumption or should we publicize it on https://galaxyproject.org/admin/reference-data-repo/ and http://datacache.galaxyproject.org/ ?
As for synchronisation of the
integrated_tool_panel.xml
file, could the one time manual work on this could happen at GCC this year (when we're all in the same room?)
You mean the section ID and name changes? I think this is a great idea. Maybe we can also figure out how to keep it in sync when we're there. Job conf changes (most importantly - running multi-slot tools with multiple slots, or increasing memory for tools that use a lot) for new tools will need to be coordinated in some way as well.
I've made some good progress so far today. With the following modifications I am able to load a tool from a shed_tool_conf.xml
in CVMFS without the corresponding install database:
diff --git a/lib/galaxy/tools/toolbox/base.py b/lib/galaxy/tools/toolbox/base.py
index 2ce4439..2f968d1 100644
--- a/lib/galaxy/tools/toolbox/base.py
+++ b/lib/galaxy/tools/toolbox/base.py
@@ -569,7 +569,7 @@ class AbstractToolBox(Dictifiable, ManagesIntegratedToolPanelMixin):
tool.hidden = True
key = 'tool_%s' % str(tool.id)
if can_load_into_panel_dict:
- if guid and not from_cache:
+ if guid and tool_shed_repository and not from_cache:
tool.tool_shed = tool_shed_repository.tool_shed
tool.repository_name = tool_shed_repository.name
tool.repository_owner = tool_shed_repository.owner
@@ -623,7 +623,7 @@ class AbstractToolBox(Dictifiable, ManagesIntegratedToolPanelMixin):
installed_changeset_revision=installed_changeset_revision)
if not repository:
msg = "Attempted to load tool shed tool, but the repository with name '%s' from owner '%s' was not found in database" % (repository_name, repository_owner)
- raise Exception(msg)
+ log.warning(msg)
return repository
def _get_tool_shed_repository(self, tool_shed, name, owner, installed_changeset_revision):
What I did:
Precreated install.sqlite
using create_db.sh
Open a transaction on sandbox.galaxyproject.org
Create /cvmfs/sandbox.galaxyproject.org/tools
and /cvmfs/sandbox.galaxyproject.org/config
, copy in install.sqlite
and shed_tool_conf.xml
:
<?xml version="1.0"?>
<toolbox tool_path="/cvmfs/sandbox.galaxyproject.org/tools">
<section id="usegalaxy_common_tools_test" name="usegalaxy.* common tools test">
</section>
</toolbox>
chown
ed everything above to 1450:1450
(the UID and GID of the galaxy user in docker-galaxy-stable)
Start docker-galaxy-stable:
sandbox@cvmfs0-psu0$ docker run -d -p 8080:80 -e GALAXY_CONFIG_INSTALL_DATABASE_CONNECTION=sqlite:////cvmfs/sandbox.galaxyproject.org/config/install.sqlite -e GALAXY_CONFIG_TOOL_CONFIG_FILE=/cvmfs/sandbox.galaxyproject.org/config/shed_tool_conf.xml -e GALAXY_CONFIG_MASTER_API_KEY=a60913da2ea2177d89e33884f0326f7d3bcdd901 -v /cvmfs/sandbox.galaxyproject.org:/cvmfs/sandbox.galaxyproject.org bgruening/galaxy-stable
ccb51acdcd43992c8d7c735108ade9e714e7b31de7f9a7383e55232f6b74b1ea
Install the jq tool from IUC in to /cvmfs/sandbox.galaxyproject.org/tools using ephemeris:
---
api_key: a60913da2ea2177d89e33884f0326f7d3bcdd901
galaxy_instance: http://cvmfs0-psu0.galaxyproject.org:8080
install_tool_dependencies: false
install_resolver_dependencies: false
tools:
- name: jq
owner: iuc
tool_panel_section_id: usegalaxy_common_tools_test
(ephemeris)nate@weyerbacher% shed-tools install -v -g http://cvmfs0-psu0.galaxyproject.org:8080/ -t tools.yaml
(1/1) Installing repository jq from iuc to section "usegalaxy_common_tools_test" at revision 5ff75eb1a893 (TRT: 0:00:00.130621)
repository jq installed successfully (in 0:00:09.244931) at revision 5ff75eb1a893
Installed repositories (1): [('jq', None)]
Skipped repositories (0): []
Errored repositories (0): []
All repositories have been processed.
Total run time: 0:00:09.376930
Fetched jq:1.5--0
to /cvmfs/sandbox.galaxyproject.org/singularity/mulled/
Published the CVMFS transaction
Added /cvmfs/sandbox.galaxyproject.org/shed_tool_conf.xml
to Test's tool_config_file
Apply the patch above to Test (in the test.galaxyproject.org
CVMFS repo) and restart
This patch is just a quick hack obviously, and there are going to be some issues. It is not recognized as a TS tool, so the tool ID is simply its short id, there is no link to the TS on the tool form, it probably breaks versioning/lineage, etc. Hopefully we can fix much of this just using the data already available in the XML.
it probably breaks versioning/lineage
that should work anyway. Also xref https://github.com/galaxyproject/galaxy/issues/5284
@mvdbeek thanks! That'll make things much easier.
Working on this in natefoo/galaxy@installdbless-shed-tools for anyone interested in following along.
if the needed images are not available, it will try to build them
This behavior can be customized by setting up a container resolvers file with the container resolver configurations you wish to use.
@jmchilton I sorta noticed that might be possible in the code but hadn't figured out the syntax of the file.
Thanks!
Some progress today.
Install DB-less tool loading is in galaxyproject/galaxy#7316.
@erasche suggested using OverlayFS in Travis to perform the installations, which I think should work, assuming Travis VMs have overlayfs in the kernel. It'll be relatively simple since we don't need to worry about deletions. Roughly:
/cvmfs
, say, /lower
mkdir /upper /work /cvmfs
mount -t overlay overlay -o lowerdir=/lower,upperdir=/upper,workdir=/work /cvmfs
docker run -d -p 8080:80 -e GALAXY_CONFIG_INSTALL_DATABASE_CONNECTION=sqlite:////cvmfs/usegalaxy.galaxyproject.org/config/install.sqlite -e GALAXY_CONFIG_TOOL_CONFIG_FILE=/cvmfs/usegalaxy.galaxyproject.org/config/shed_tool_conf.xml -e GALAXY_CONFIG_MASTER_API_KEY=deadbeef -e GALAXY_CONFIG_CONDA_PREFIX=/cvmfs/usegalaxy.galaxyproject.org/dependencies/conda -v /cvmfs/usegalaxy.galaxyproject.org:/cvmfs/usegalaxy.galaxyproject.org bgruening/galaxy-stable
galaxy-wait ...
shed-tools -g http://localhost:8080/ -a deadbeef ...
planemo test ...
ssh usegalaxy@cvmfs0-psu0.galaxyproject.org cvmfs_server transaction usegalaxy.galaxyproject.org
rsync -av /upper/ usegalaxy@cvmfs0-psu0.galaxyproject.org:/cvmfs/usegalaxy.galaxyproject.org || { ssh usegalaxy@cvmfs0-psu0.galaxyproject.org cvmfs_server abort -f sandbox.galaxyproject.org; travis_terminate 1; }
ssh usegalaxy@cvmfs0-psu0.galaxyproject.org cvmfs_server publish ... sandbox.galaxyproject.org
Proof 'o concept: https://travis-ci.org/natefoo/usegalaxy-tools/builds/489823965
@nekrut has tasked me with getting tools and data on usegalaxy.org unified with those on usegalaxy.eu and updated regularly as is done on usegalaxy.eu. In meeting w/ @erasche and @bgruening in Freiburg this week we've come up with the following. This could also be of use to usegalaxy.org.au (cc: @Slugger70).
Tool synchronization
It's not desirable for us to support all of usegalaxy.eu's legacy tools, nor for them to support all of ours. As a result, we propose the creation of a new CVMFS repository (for now, we'll call it
usegalaxy-tools.galaxyproject.org
) that can be shared by both instances containing a common set of tools to be available on both.Dependencies will be provided via Singularity, and the necessary images are already being built and upload to depot. In addition to nice containerized dependency management, this also gives us better security through execution sandboxing. These will be mirrored to a second new CVMFS repository, say,
singularity.galaxyproject.org
(and ultimately, that repository will become the canonical source, rather than depot), that can be mounted directly on to a cluster running Galaxy jobs, so that all the images are automatically available to instances without consuming local storage space.Thus, we should be able to have a CVMFS stratum 0 where, much like we do for Test/Main now, we:
Unlike Test/Main, this instance will be entirely unconnected to Test/Main's database. Other than the install database (sqlite) and
shed_*.xml
, this instance should be able to be entirely ephemeral.Issues
There can only be one install database per instance, but I suspect the only time when the install database matters as far as running tools is concerned is when tool shed dependencies are used, so that is not a concern here. It will have implications on the admin UI (e.g. they probably won't show up in Manage Tools) and maybe tool help text, but I think we can live with the former and fix the latter at some point. But as usual when I think something will be trivially easy, this is probably not going to work like I think.
Tool sections/ordering is an issue. Galaxy needs to be able to write to
integrated_tool_panel.xml
, so it cannot live in CVMFS, but without a unifiedintegrated_tool_panel.xml
, sections and tools will be ordered very differently across instances. Additionally, we will have to unify our section IDs since the section id into which a tool is installed is stored inshed_tool_conf.xml
(which will live in CVMFS).The section unification should be a one-time process that we can undertake by hand together. The
integrated_tool_panel.xml
synchronization, I don't have a good solution for at the moment.Likewise, new tool installs that need changes from the default destination - things like the number of slots, amount of memory, etc. - will need to be coordinated when these tools are installed to the common repo.
For the time being, we will need to map the common tools to the singularity destination(s) one-by-one. This is easier for usegalaxy.eu due to their handy job-config-as-a-service (JCaaS). Although I'm not necessarily sold on the idea of a web service, I am not at all happy with the way job config works for usegalaxy.org and I do think we can utilize the JCaaS idea to get a similar dynamic config. However, with new tool versions living in the common repo and old versions living in the old repo, some tools will have to be mapped by version, not just by versionless ID, so this will get ugly. A better solution is needed here.
@bgruening discovered that the Singularity image path is configurable through the
container_image_cache_path
option, but this is a bit of a hack. Setting it causes Galaxy to check<container_image_cache_path>/singularity/mulled
for images matching the requirements, but if the needed images are not available, it will try to build them, which of course is not possible at runtime with CVMFS (nor is it desirable, we want them to always be preinstalled). It would be good if we could control whether building should be attempted, and also have a destination param to control where to find Singularity images. This should probably also be subdirectoried since the list is likely to grow larger than the CVMFS preferred catalog size. Something like/cvmfs/singularity.galaxyproject.org/c/o/coreutils:8.25--0
would probably be sufficient.Action items
Testing
shed_tool_conf.xml
, test the tool and see what (if anything) is broken by not having the install dbDevelopment
outputs_to_working_directory
so that Singularity can ro-mount input datasets without breaking other destinations like Pulsar (xref: galaxyproject/galaxy#6087)integrated_tool_panel.xml
usegalaxy-tools
repo to a destination rather than requiring usegalaxy.* admins to update their job_conf.xml with every new tool installData synchronization
To be written...
Galaxy synchronization
We'd also like to have usegalaxy.org and usegalaxy.eu run off a single copy of Galaxy living in CVMFS. This presents a few challenges, such as non-upstreamed datatypes on usegalaxy.eu and synchronization of updates, especially where database migrations are concerned.
More to be written about this as well...
Singularity all the things
extracted to https://github.com/galaxyproject/usegalaxy-playbook/issues/262