Building a Galaxy container image for Kubernetes - a meta issue

afgane commented 5 years ago

We are starting to work on an optimized container image for Galaxy that will allow K8S built-in features to be exercised (e.g., dynamically changing the number of container replicas, upgrades and rollbacks) and bring down the application startup time. The idea is to build one image and allow it to be parameterized to run either as a job handler(s) or web handler(s), as stateless containers.

The goals of this image are:

Provide a barebones image for Galaxy containing a consensus-driven minimal set of features
Highly optimised and size reduced to the maximum possible for fast download and startup
Will serve as a base image for more specialised images

To build this image, the plan is to leverage existing Ansible roles and work on optimizing the Dockerfile to minimize the image size. This container is intended to run only Galaxy while all other services will be run in their own respective containers and linked together (e.g., wrapped as a Helm chart for automated deployment of a complete, production-ready Galaxy). With that, the following is a list of current Ansible roles we believe are needed to create a minimal, functional Galaxy image:

ansible-galaxy-os
ansible-galaxy
ansible-galaxy-cvmfs
ansible-trackster Ideally, this image (i.e., Dockerfile) would be placed into the Galaxy source tree and represent the default Galaxy image. Other existing container approaches could derive from this image to provide the specialized services that they need, while leveraging the optimizations from this image. Meanwhile the playbook and the initial file are being developed in this repo: https://github.com/CloudVE/galaxy-kube-playbook

It is not entirely clear yet how this deployment will operate in the context of K8S wrt to Galaxy's static content. Ideally, the K8S service will load balance across multiple replicas and route the server requests to one of the Galaxy replicas while also serving Galaxy static content separately (e.g., from a CDN). This may require some changes to Galaxy... Ideas welcome.

bgruening commented 5 years ago

xref: https://github.com/bgruening/docker-galaxy-stable/tree/master/compose

Sounds like a lot of overlap.

afgane commented 5 years ago

Of course we are aware of the docker-galaxy-stable repo and, as stated above, the goal is to eventually offer integration points for it. The focus here is to create a single, minimal Dockerfile for the Galaxy project that other solutions can inherit from and build on top of, be it for Compose, Kubernetes, or anything else. As it stand, galaxy-stable has an existing user base and hence compatibility concerns re. major architectural changes. It has a stateful nature, it assumes the inheritance structure of the images (e.g., galaxy-init), and the management of configuration and persistent data are not suited for Kubernetes in particular. So we’d like to rethink what is the absolute minimal Galaxy container, refactor what exists, and create a flexible, base Galaxy container that can then be integrated and customized for specific solutions, including galaxy-stable. This has the additional advantage that all tests and other infrastructure could also potentially use this image and the Dockerfile itself can reside in the Galaxy repo.

hexylena commented 5 years ago

galaxy-stable has an existing user base

The compose branch is less used as it has been more of an experiment for us to be more ready for distributed container engine setups. We made it pretty clear that we still experiment with the container layout and try to optimize things for common workflows (like swarm, k8s, etc.)

You could easily make changes that branch without affecting the vast majority of users, and the users who are affected, are already looking to use it on k8s, and would be happy for the changes.

Would you be open to investigating that alternative? Or are you already decided that there is no way it could work?

it assumes the inheritance structure of the images

This could be changed. We used that in order to line up with what Galaxy Project was doing and to save time. If it is necessary to switch that, great no problem! Let's deprecate the galaxy-init image.

the management of configuration and persistent data are not suited for Kubernetes

So let's make it suitable. Let's do that rather than throwing away the work put into the existing one, into the ansible roles which have seen many new features added by galaxy-stable contributors, and into the compose branch of docker-galaxy-stable, where we've explicitly tried to make it more suitable for running on container engines.

Right now this reads like "well, the community wrote something but we don't want what they've done, they should have to rewrite based on ours" (re: "offer integration points") Is that correct, is that what you intend?

It gives us, as community members, who built a project to fill a space where Galaxy wasn't actively working in the past couple years, a really negative feeling. Myself specifically since I've done some work on the compose branch, I have the feeling that it was all a waste now because you will throw it all away and we'll have to rewrite to be compliant with your solution.

Would you be open to at least taking a look at the compose branch?

absolute minimal Galaxy container

We both want to achieve a smaller container. And if we work together, use the existing container and give you more free time to work on associated issues, I think we can achieve that with fewer wasted person hours and effort.

Some of the issues we've noted but not yet been able to dedicate time towards fixing:

Recently Azure was added as a non-conditional dependency, container sized increased by 205 Mb. Fixing this would benefit everyone, not just container users. Same for boto and openstack. If these are made conditional, dependent on the configuration of Galaxy, things could really improve for the better.

If we could put effort towards galaxy running in conda, and removing all wheels, this would significantly minimize the container since the wheels sometimes include multiple platform .so files and other things. Even if conda isn't used for dependency management, at least it avoids this and really has platform-specific install files which avoid the issues of the generic wheels.

Ansible, I guess what is behind all of this is that we tried to use the ansible roles, all of the huge amount of work that was put into deploying and configuring galaxy, and that we have so many different roles. Ansible for container deployment may not yield to an completely minimal container, for that we can use different tricks, like the new build/run container separation in Dockerfiles.

We're open to all of the things you want to do. We have similar goals.

additional advantage that all tests and other infrastructure could also potentially use this image

All of the tests and other infrastructure could just as easily potentially use docker-galaxy-stable (+ your additions). In some places they already do. I'm not sure how a container you write would be any better of a fit? Maybe you could clarify?

can reside in the Galaxy repo.

I talked to @bgruening, he's open to moving it under galaxyproject.

We're really open to new ideas, radical changes, whatever is interesting.

afgane commented 5 years ago

There may be some misunderstanding about what this issue is about because we’re happy to work on common pieces. This issue is about figuring out what the minimal Galaxy image is as something that has not been been defined or implemented yet. The current galaxy-stable is 3.7GB uncompressed (1.3GB compressed) and the Compose setup images are larger. What can be done about that is the point of this issue. This new minimal image can be thought of as a first step in refactoring the Galaxy container deployments. It is not meant for use by end-users, and will not replace docker-galaxy-stable as the means of obtaining a reasonably-featured Galaxy container.

In fact, our initial approach was to naively reuse the already existing images, as proposed by me and discussed at length here. However, the reason we are proposing this minimal image is that we came to realize that the current solution is (1) a Compose first approach, and (2) it is based on the existing monolithic approach to installing Galaxy that was then broken down into something that will work with containers. This was an issue that was raised by @pcm32 as well. The critical problem with that is that it goes from the maximal to minimal direction, and therefore, we cannot simply refactor into it; we must define what the minimal barebones solution looks like in the first place.

To do this, we’re suggesting a low level approach as this first stage where the goal is to end up with a Galaxy container that runs basic Galaxy well, not one that can be configured for X different scenarios right in the container and hence has to incorporate all those combinations. Those pieces should probably be explored at the next stage and in coordination with container orchestration. So the outcome of this first stage is just a Galaxy container, not much more. As an example of the effectiveness of this approach, the current minimal image stands at ~500MB uncompressed (<200MB compressed)--that’s a 7-fold reduction in size over the current image!

An example of why the bottom up approach seems sensible to achieve this is that adopting a multi-stage build to minimize size becomes more evident when using the bottom-up approach, whereas such experiments are difficult to perform with complex dependencies. This is driven home further by the fact that galaxy-stable and the Compose setup use two different base images, signifying the need for a common base-image. Devising this minimal solution doesn’t replace or preclude other solutions from existing; it creates the basis for optimizing them. At no point did we suggest that the existing Compose of galaxy-stable effort be replaced or thrown away. This approach is not exclusionary but compositional. Going forward, the hope is that this minimal Galaxy image can be built and used as the FROM statement in the galaxy-stable image, and elsewhere (that’s what I meant by the integration points).

What we are proposing as a radical idea, is that we all help define this minimal image in such a way that everyone can continue to build from it.

pcm32 commented 5 years ago

I also think that we need to re-think the base-web-init strategy for composition. I would rather see the following roles in the orchestration (eliminating init):

web container: serve web content, can be independently scaled. This might be even broken into smaller pieces inside a pod if needed.
handler container: for running job/workflows handlers, can be independently scaled
db container (as we have it now)
proftpd container (as we have it now)
monitoring container
other components

Web and handler container could use the same container image with different binaries being run. They could stem from the base image proposed by @afgane. Users would put their flavour (branding, tools, etc) on top, extending it. Currently there is a lot of copying from init which could all come inside the same container, and that in some environments can delay start by some minutes, and copy only to a shared file system the bare minimums (like tool wrappers).

I would suggest as well that we move away from the use of an internal supervisor, in an orchestrated setup the orchestrator should act as your process manager.

We should have the policy in the base container that anything optional should be installable via ansible roles (or other clean mechanism), and its installation be turned off by default (allowing changes on build time via injection of env vars). There should be very strong reasons to allow any package to go from optionally installed to by default installed.

I agree that is a good idea to start from scratch, instead of trying to modify the compose part (which I use actively, but is quite complex and has plenty of inherited assumptions).

natefoo commented 5 years ago

I spun the issue of conditional dependencies out in to its own issue, #7320.

The idea of a base image seems like a good idea, but I hope both the k8s work and bgruening/docker-galaxy-stable can be coordinated and leverage the same work so that there's mutual benefit for both goals.

mhabsaoui commented 5 years ago

Hi, Just wondering what is the galaxy/galaxy:19.05 official image for ? :

It a web based app image or just a base image ? It would be nice to have (on DockerHub) at least the associated Dockerfile and some minimal instructions to run/use it...

BTW, it would be better from my User point of view, to have a docker-compose orchestrating: base official galaxy image, external database image, other additional/optional services...

I quote @afgane

not one that can be configured for X different scenarios right in the container and hence has to incorporate all those combinations.

Cheers.

bgruening commented 5 years ago

@mhabsaoui the image you are referring is an all-in-one image. You can find the Dockerfile and the project here: https://github.com/bgruening/docker-galaxy-stable/

A composed version is also available in the same project in the compose folder here: https://github.com/bgruening/docker-galaxy-stable/tree/master/compose

Please note that 19.05 is not released yet and those images are currently in beta state.

mhabsaoui commented 5 years ago

@mhabsaoui the image you are referring is an all-in-one image. You can find the Dockerfile and the project here: https://github.com/bgruening/docker-galaxy-stable/

A composed version is also available in the same project in the compose folder here: https://github.com/bgruening/docker-galaxy-stable/tree/master/compose

Please note that 19.05 is not released yet and those images are currently in beta state.

In fact, the image I was referring to was this one galaxy/galaxy:19.05 which it seems to belong to official galaxy project. And it needs a dockerfile/git Repo to be added.

It seems not to be the same image same as yours...

bgruening commented 5 years ago

@mhabsaoui Oh if its Dockerhub, then no. It is not the project I linked. No clue what that is. It's largely confusing but I guess @afgane can answer your question.

afgane commented 5 years ago

@mhabsaoui the galaxy/galaxy:19.05 image is the minimal Galaxy image (as described higher up in this issue) for the pending 19.05 release. With that, the image is not intended to be used in isolation but as part of a container orchestration solution, such as this Helm chart https://github.com/CloudVE/galaxy-kubernetes/tree/v3/galaxy Keep in mind please that the chart (and the image) are still work-in-progress and, for the time being, I can only recommend using this for development purposes. If you want to use Galaxy as an out-of-the-box system, the link Bjoern provided for the galaxy-stable is the way to to.

mhabsaoui commented 5 years ago

Please keep in mind as a user return of experience, to KISS in dockerising galaxy and exposing its options. Avoid very heavy containers (proxy, clusters...) and just provide at first very basic galaxy container with minimal stack (requirements, tools, configs...).

galaxyproject / galaxy

Building a Galaxy container image for Kubernetes - a meta issue #7225