galaxyproject / galaxy-helm

Minimal setup required to run Galaxy under Kubernetes
MIT License
41 stars 38 forks source link

Syncing with galaxy-docker-project #2

Closed afgane closed 6 years ago

afgane commented 6 years ago

I'd like to start/document discussion about what will it take to make it possible to integrate and/or interchange resources from this repo with the resources from the https://github.com/bgruening/docker-galaxy-stable/tree/master/compose repo.

A few things come to mind and please add others:

I'm sure there's more but those seem like the minimal set given my current familiarity with the two efforts. Please comment and let's see if&how this can be accomplished.

pcm32 commented 6 years ago

Hi guys!

I gave this a first try today (well, actually, this would be like my third try so far, I have attempted this in the past but have been a bit overwhelmed by the amount of details needed in contrast to my availability and other priorities), but it is going to require some work.

Basically what we are doing currently in the phenomenal helm chart setup is:

These conditional variables are currently taken care by logic inside the phenomenal container ansible/run_galaxy_config.sh, ansible/set-galaxy-config-values.yaml, ansible/configure_galaxy.pyand ansible/remove-api-key.yaml (last three being executed from the first one).

ansible/run_galaxy_config.sh expects to be run standing in the directory where the config directory is present and it is called before Galaxy is run (which will present an issue since on docker-galaxy-stable, only when galaxy is run I think that the /export directory links get some contents, which in turn would be needed before hand for our script). It takes care for us of:

On the build side of our container (so not in the runtime of the container in k8s), it currently does:

I think this covers most of it. I have started efforts to duplicate env variables injected in helm to cover most of the needed GALAXYCONFS*, but for deciding when to trigger most of the functionality that ansible/run_galaxy_config.sh takes care of, I need a better understanding of the running process on docker-galaxy-stable.

I need to be able to use a particular git revision of a defined galaxy git repo (fork). Sometimes my PRs for k8s don't make it in time to the galaxy releases, so I need to use certain releases with some of my commits on top, to deliver functionality for our own releases in time. This is easy in the current scenario where I control our galaxy container, but would be more complex (or I fail to see if it possible) if moving to the docker-galaxy-stable ones.

I have started a flavour inside compose for a galaxy-k8s which is derived from galaxy-base and has most of galaxy-web functionality but getting rid of slurm and other schedulers that we (in our project) don't need. Maybe we should discuss where in the hierarchy of compose image this should go, and maybe some containers down the line could later add the required things for other schedulers. confd sounds like a good idea, but I would go by parts I guess, to have something functional soon, and later introduce more sofistication.

Orchestrating other containers (like the ftp part) shouldn't be a problem, but I would first aim to sort all the issues above. The main complexity is that there is some loose coupling between the helm chart version and galaxy container to be used, and I need to keep maintaining the working ones as well (happily both objects are versioned, so this shouldn't be an issue).

While I'm eager to integrate more to the docker-galaxy-stable compose containers as discussed with @bgruening, moving away from helm is a no-no for me (for the reasons I showed to @bgruening yesterday in Paris) and would seriously hamper my ability to pursue this further integration (as my main responsibility is to have something working on PhenoMeNal).

I hope that this is useful!

nuwang commented 6 years ago

@pcm32 Thanks for this excellent write-up Pablo, it's very detailed and really helpful!

I tend to agree that we should start with something simple and functional, and refactor as necessary to introduce more complexity as required. I'm hoping that the Helm chart will completely insulate us from those issues, and allow us to refactor the container arrangement as necessary.

One issue that comes to mind, is how to handle different flavours of Galaxy. This is particular important for the GVL/Galaxy-on-the-cloud, to achieve the desired level of feature parity with existing CloudMan deployments.

In the current incarnation of GVL/Galaxy-on-the-cloud, this is achieved by using a VM image, which is matched at runtime to a user-selected tarball containing Galaxy+the tool database, resulting in the "flavour". In addition, it could also have prepopulated datasets, and whatever else that's desired.

This also has the added advantage that the database schema does not have to be created at runtime, making startup much faster. Although we've been using a single compressed postgres database so far, @jmchilton suggested that a different approach would be to have the tool database in sqlite, which could simply be mounted into the Galaxy container. I think that sounds like a much better option. A drawback is that we won't be able to have any other pre-populated artefacts, like workflows, shared histories etc.

So far, the current phnmnl container loads the workflows into the database at runtime, and does not use any toolshed tools, is that correct? My previous experience has been that it's not very practical to install tools from the toolshed at runtime - it takes a really long time to install a considerable number of tools - although I'm not sure whether this can be significantly cut down by using the Dockerized tool versions. @bgruening What approach are you using?

What would be a good way to achieve an effect similar to the above? How can we dynamically extract a tarball containing the database and link it up to the container? Or is there a more desirable option?

bgruening commented 6 years ago

I will try to answer to all points but will skip the GALAXY_CONFIG_* points. I think this is quite clear and I hope this saves @pcm32 a lot of hacky things.

ansible/run_galaxy_config.sh expects to be run standing in the directory where the config directory is present and it is called before Galaxy is run (which will present an issue since on docker-galaxy-stable, only when galaxy is run I think that the /export directory links get some contents, which in turn would be needed before hand for our script). It takes care for us of:

I guess this can be fixed by changing the configs in /galaxy-central/ and not in /export/. During startup everything is linked/copied into /export/ so you should change the original source during build. But I hope this is not at all needed with GALAXYCONFIG*. Hopefully you don't need to change files.

  • Set up which database engine will be used, based on conditional env variables.

Solved, with GALAXY_CONFIG

  • If available, copy a user's galaxy.ini.injected into the location of galaxy.ini (used for dev purposes).

This is already possible. If you mount files into the /export/ directory it will be taken up. For example this way: https://github.com/bgruening/docker-galaxy-stable#Personalize-your-Galaxy

  • Installs ansible (no problem to replace this)

Or we should use this. The startup script is really complex nowadays ... on the otherhand ansible + python is quite big. Not sure about this point. I have worked in this here: https://github.com/galaxyproject/ansible-galaxy-extras/pull/144 Not as powerful than ansible, but lightweight.

  • setup k8s_supplemental_group_id in the config/job_conf.xml (this I think is currently missing on the jinja template in ansible galaxy-extras for job_conf.xml
  • setup k8s_fs_group_id in the config/job_conf.xml, same issue as above.
  • setup k8s_persistent_volume_claim_name in the config/job_conf.xml
  • setup k8s_pull_policy in the config/job_conf.xml

Jupp, lets add this.

  • setup admin user's username, email, password and add workflows for sharing if available in the workflows directory (inside the working directory) through master api key. For this it starts galaxy to be able to access it through the API, runs python ansible/configure_galaxy.py on an venv that includes a newer version of bioblend, then stop the lifted galaxy, and remove the master api key. This initial run of galaxy, made through paster, also takes care of setting the database schema the if there is no such schema in the configured access to db engine (which by then must exist; in the case of a separate postgres container, this is enforced by the galaxy replication controller, which only inits the galaxy container once the postgres one is running).

On the build side of our container (so not in the runtime of the container in k8s), it currently does:

  • Starts from ubuntu:14.04

Is this a big problem? I happy to update our images, the only reason I have not done this, or I'm conservative with this is that users need to migrate there database to a potentially new postgresql version.

  • Set some labels for versioning purposes (I can easily move that to an external final container derived later on from the one that we generate within the compose ones).

We can add those during runtime with --label, right?

  • Adds the minimal dependencies that we have found we need in terms of apt packages for a from zero galaxy install.

Would be nice to use this to also slim down our images. I don't consider this as critical. I would rather see a running big setup under testing and then slim this down as long as it does not break.

  • clone Galaxy from our fork, as sometimes I need to have some additional k8s commits that didn't make it to a galaxy release on top of that release version.

You can do this with: https://github.com/bgruening/docker-galaxy-stable/blob/master/compose/buildlocal.sh#L5 or with Docker ARGs: https://github.com/bgruening/docker-galaxy-stable/blob/master/galaxy/Dockerfile#L14

  • Set pykube==0.15 to be installed as part of requirements.txt (currently the ansible galaxy-extras installs pykube through pip with no version set, it would be good to set that to 0.15 to control for future deprecation of Kubernetes API objects).

Oh yes, this should be fixed. Ideally in Galaxy main.

  • Copy all the config files that we currently rely on to the config directory or the galaxy home folder (I can easily move that to an external final container derived later on from the one that we generate within the compose ones).

Do they need in the container, or can we mount them into /export/, assuming we still need manually edited config files. To make it clear I think everything that we need to manually configure is in my eyes broken and we should fix this in Galaxy.

  • Copy our tool wrappers (tools/phenomenal), in sync with the job_conf copied in the previous step (I can easily move that to an external final container derived later on from the one that we generate within the compose ones).

We have solved this here: https://github.com/bgruening/docker-galaxy-stable#integrating-non-tool-shed-tools-into-the-container--toc

  • Set up the virtualenv to be used by galaxy, at the container build time.

I'm wondering why this is needed. https://github.com/bgruening/docker-galaxy-stable/blob/master/galaxy/Dockerfile#L28

  • Set up the PYKUBE_KUBERNETES_SERVICE_HOST env (I think you do this through ansible on docker-galaxy-stable compose containers).
  • Copy HTMLs for the welcome page (I can easily move that to an external final container derived later on from the one that we generate within the compose ones).

This could be used: https://github.com/bgruening/docker-galaxy-stable#Personalize-your-Galaxy

  • Copy our ansible/bash scripts for setup to ansible in the main galaxy directory.
  • Copy our example workflows (I can easily move that to an external final container derived later on from the one that we generate within the compose ones).

Yeah, this should be part of the the Phenomenal flavor.

  • Copy some testing logic inside (I can easily move that to an external final container derived later on from the one that we generate within the compose ones).

What is this? Is this general testing, can we move this to a separate testing container as here for example: https://github.com/bgruening/docker-galaxy-stable/tree/master/test

  • Add some missing datatypes (I can easily move that to an external final container derived later on from the one that we generate within the compose ones).

This should be upstreamed into Galaxy. You can also mount this in via /export/

I think this covers most of it. I have started efforts to duplicate env variables injected in helm to cover most of the needed GALAXYCONFS*, but for deciding when to trigger most of the functionality that ansible/run_galaxy_config.sh takes care of, I need a better understanding of the running process on docker-galaxy-stable.

Let me know if I can help here. In general everything should be configurable via ENV.

I need to be able to use a particular git revision of a defined galaxy git repo (fork). Sometimes my PRs for k8s don't make it in time to the galaxy releases, so I need to use certain releases with some of my commits on top, to deliver functionality for our own releases in time. This is easy in the current scenario where I control our galaxy container, but would be more complex (or I fail to see if it possible) if moving to the docker-galaxy-stable ones.

easy to do with: https://github.com/bgruening/docker-galaxy-stable/blob/master/compose/buildlocal.sh#L5 or with Docker ARGs: https://github.com/bgruening/docker-galaxy-stable/blob/master/galaxy/Dockerfile#L14

I have started a flavour inside compose for a galaxy-k8s which is derived from galaxy-base and has most of galaxy-web functionality but getting rid of slurm and other schedulers that we (in our project) don't need. Maybe we should discuss where in the hierarchy of compose image this should go, and maybe some containers down the line could later add the required things for other schedulers.

Please make a proposal, this should be easy to sort out I hope.

confd sounds like a good idea, but I would go by parts I guess, to have something functional soon, and later introduce more sofistication.

Fully agree.

While I'm eager to integrate more to the docker-galaxy-stable compose containers as discussed with @bgruening, moving away from helm is a no-no for me (for the reasons I showed to @bgruening yesterday in Paris) and would seriously hamper my ability to pursue this further integration (as my main responsibility is to have something working on PhenoMeNal).

I'm wondering if we could generate a helm chart out of the yaml and the metadata which we already have. So that we provide some small tool ... galaxy deploy kube --helm that runs the helm conversion.... not sure if feasible.

bgruening commented 6 years ago

One issue that comes to mind, is how to handle different flavours of Galaxy. This is particular important for the GVL/Galaxy-on-the-cloud, to achieve the desired level of feature parity with existing CloudMan deployments.

I don't see why this is different from the current state. Falvors are just adding tools/workflows to a bare-high-quality instance.

In the current incarnation of GVL/Galaxy-on-the-cloud, this is achieved by using a VM image, which is matched at runtime to a user-selected tarball containing Galaxy+the tool database, resulting in the "flavour". In addition, it could also have prepopulated datasets, and whatever else that's desired.

Have a look at how a flavor is created here. https://github.com/bgruening/docker-galaxy-ngs-preprocessing

This also has the added advantage that the database schema does not have to be created at runtime, making startup much faster.

It gets never created during startup time.

Although we've been using a single compressed postgres database so far, @jmchilton suggested that a different approach would be to have the tool database in sqlite, which could simply be mounted into the Galaxy container. I think that sounds like a much better option. A drawback is that we won't be able to have any other pre-populated artefacts, like workflows, shared histories etc.

I guess I'm missing something here. Why is this so complicated? We create predefined flavors and pull these containers down if needed, no?

So far, the current phnmnl container loads the workflows into the database at runtime, and does not use any toolshed tools, is that correct? My previous experience has been that it's not very practical to install tools from the toolshed at runtime - it takes a really long time to install a considerable number of tools - although I'm not sure whether this can be significantly cut down by using the Dockerized tool versions. @bgruening What approach are you using?

We install tools during bulid time. This is fast, as it is just downloading the conda packages. However, it makes the image big/huge. If size matters we can put the conda-envs into CVMFS and share it or install them simply via Conda during tool-runtime.

What would be a good way to achieve an effect similar to the above? How can we dynamically extract a tarball containing the database and link it up to the container? Or is there a more desirable option?

I think this is way to complicated. Let's create prebuild images and pull them down as needed.

pcm32 commented 6 years ago

Thanks @bgruening and @nuwang for the excellent feedback... I'll try to advance based on the suggestions that @bgruening added in-line. I think I have most of the helm changes needed by now.

pcm32 commented 6 years ago

@bgruening I don't mind about ubuntu:14.04, I was just listing all the steps for the sake of completion.

nuwang commented 6 years ago

Is this a big problem? I happy to update our images, the only reason I have not done this, or I'm conservative with this is that users need to migrate there database to a potentially new postgresql version.

We are going to have to think of ways to handle database upgrades transparently, and while I don't think we need to solve that problem during the first iteration, it would be good to keep a discussion going on strategies for handling this. This is a discussion on the official Postgres docker image: https://github.com/docker-library/postgres/issues/37 Can't say I'm particularly encouraged, any Helm options/experiences here @pcm32?

Have a look at how a flavor is created here. https://github.com/bgruening/docker-galaxy-ngs-preprocessing

Ok, so you are using a bundled sqlite database as the tool database, and installing all tools into the Galaxy container? If this scales well, I don't see an insurmountable problem. For comparison, how big is the tool database on Galaxy main?

I guess I'm missing something here. Why is this so complicated? We create predefined flavors and pull these containers down if needed, no?

A few reasons - my understanding was that these container sizes were very large, in the order of 10GB+ when there are a sizeable number of tools. That means we are looking at a 20 minute+ install time, vs < 7 minutes, which has generally been our performance target (up from the original 2 minutes that CloudMan took in the days of using volume snapshots). The CVMFS option is sounds good.

The second is that it doesn't address how we can have a pre-populated user database, say workflows. Again, importing at startup maybe ok, but impacts startup time.

Which is why I was wondering whether there was a good solution for handling these kinds of pre-populated databases that would generalise well to any database connected app, in addition to Galaxy. Also, relates to upgrade problem above.

I think this is way to complicated. Let's create prebuild images and pull them down as needed.

Agree, this seems like the sane approach to start with.

bgruening commented 6 years ago

Is this a big problem? I happy to update our images, the only reason I have not done this, or I'm conservative with this is that users need to migrate there database to a potentially new postgresql version.

We are going to have to think of ways to handle database upgrades transparently, and while I don't think we need to solve that problem during the first iteration, it would be good to keep a discussion going on strategies for handling this. This is a discussion on the official Postgres docker image: https://github.com/docker-library/postgres/issues/37 Can't say I'm particularly encouraged, any Helm options/experiences here @pcm32?

If we are here talking about upgrading postgresql databases, this is only relevant for persistent Galaxy's over many years. And in this case I assume that an admin knows how to upgrade a postgresql datatbase from one version to the other. I don't think this is a problem we need to solve, especially not now.

Have a look at how a flavor is created here. https://github.com/bgruening/docker-galaxy-ngs-preprocessing

Ok, so you are using a bundled sqlite database as the tool database, and installing all tools into the Galaxy container? If this scales well, I don't see an insurmountable problem. For comparison, how big is the tool database on Galaxy main?

No sqlite. Everything is in postgresql. And this are just entries in a DB, some rows. Maybe a few thousands. This is imho neglectable in size.

I guess I'm missing something here. Why is this so complicated? We create predefined flavors and pull these containers down if needed, no?

A few reasons - my understanding was that these container sizes were very large, in the order of 10GB+ when there are a sizeable number of tools. That means we are looking at a 20 minute+ install time, vs < 7 minutes, which has generally been our performance target (up from the original 2 minutes that CloudMan took in the days of using volume snapshots). The CVMFS option is sounds good.

The container size is only large if you include "tool-dependencies". This is not a must anymore.

The second is that it doesn't address how we can have a pre-populated user database, say workflows. Again, importing at startup maybe ok, but impacts startup time.

Correct me if I'm wrong, but at anytime you need to pull some data down or set something up. If you store a pre-calculated tarball somewhere that does magic things, you can also store a pre-compiled Docker image somewhere.

Which is why I was wondering whether there was a good solution for handling these kinds of pre-populated databases that would generalise well to any database connected app, in addition to Galaxy. Also, relates to upgrade problem above.

pre-populated databases that would generalise well to any database connected app, ?

jmchilton commented 6 years ago

Correct me if I'm wrong, but at anytime you need to pull some data down or set something up. If you store a pre-calculated tarball somewhere that does magic things, you can also store a pre-compiled Docker image somewhere.

Distributing bundles (shed_tool_conf entries, tool directories, sqlite for the tool shed install database, and potentially conda dependencies) as tarballs would allow reuse of these flavors outside of Docker / containerized Galaxies. Also it would allow combining several flavors together potentially which can't be done with Docker I don't think.

Maybe think of it as a "compiled" version of the emphemeris tool lists?

I like this idea a lot but I'm not sure if resources would be better spent on this or improving ephemeris or setting CVMFS for all tool dependencies.

nuwang commented 6 years ago

If we are here talking about upgrading postgresql databases, this is only relevant for persistent Galaxy's over many years. And in this case I assume that an admin knows how to upgrade a postgresql datatbase from one version to the other. I don't think this is a problem we need to solve, especially not now.

Wouldn't this mean that we would need to keep the postgres database container version frozen in the helm chart, because the moment we upgrade to a newer postgres container, the data will no longer be usable? We should aim for a minimal admin system. "helm upgrade galaxy" and you are done, dependencies and all - no need to know anything about systems admin, CloudMan can even run "helm upgrade" on behalf of the user.

I think it would be good to plan a pathway to make upgrades possible, even if we don't implement upgrades in the first iteration, or we risk locking people into specific versions.

No sqlite. Everything is in postgresql. And this are just entries in a DB, some rows. Maybe a few thousands. This is imho neglectable in size.

I'm probably missing something here, is the postgres database bundled with the Galaxy container? If it's separate, do you have a separate build of postgres for each flavour?

The container size is only large if you include "tool-dependencies". This is not a must anymore.

Ok, in that case, I guess we can start off with this minimal approach, where dependencies are pulled in at runtime, and then transition to having a globally shared CVMFS with dependencies. Sound ok?

pre-populated databases that would generalise well to any database connected app, ?

I was thinking of any app with a database, say lovd: http://www.lovd.nl/3.0/home If we preload some data into a database, we would either need to build a separate database container for each app, or we would need to be able connect the data to the container at runtime. For the latter, a tarball of the database is one approach, which is extracted at runtime onto a volume, and mapped to a container folder. Loading data at container startup is another. Are there any other general patterns/ideas for handling this?

bgruening commented 6 years ago

Correct me if I'm wrong, but at anytime you need to pull some data down or set something up. If you store a pre-calculated tarball somewhere that does magic things, you can also store a pre-compiled Docker image somewhere.

Distributing bundles (shed_tool_conf entries, tool directories, sqlite for the tool shed install database, and potentially conda dependencies) as tarballs would allow reuse of these flavors outside of Docker / containerized Galaxies.

Sure. But this does not exist yet and it solves a problem which we don't have at the moment. @nuwang is this a problem you where referring to? I understood that it is pure disc-space concern?

Also it would allow combining several flavors together potentially which can't be done with Docker I don't think.

This is partially working with Docker. You can base one bundle on top of the other, you can not freely mix them. Freely mixing them means you allow replication. I doubt this the tarball approach will be that easy to implement. You will end up with multiple conda envs and Galaxy need to decide at some point which to take.

Maybe think of it as a "compiled" version of the emphemeris tool lists?

I like this idea a lot but I'm not sure if resources would be better spent on this or improving ephemeris or setting CVMFS for all tool dependencies.

I think the CVMFS approach could work today.

If we are here talking about upgrading postgresql databases, this is only relevant for persistent Galaxy's over many years. And in this case I assume that an admin knows how to upgrade a postgresql datatbase from one version to the other. I don't think this is a problem we need to solve, especially not now.

Wouldn't this mean that we would need to keep the postgres database container version frozen in the helm chart, because the moment we upgrade to a newer postgres container, the data will no longer be usable?

Yes. But it is not that bad as it sounds. You usually keep your postgresql database for many years stable. You need to migrate data, no matter what.

We should aim for a minimal admin system. "helm upgrade galaxy" and you are done, dependencies and all - no need to know anything about systems admin, CloudMan can even run "helm upgrade" on behalf of the user.

And this will work. It will upgrade Galaxy. It will not upgrade postgresql from 9.3 to 10.0. You want at least to do a backup before this. @nuwang how many database upgrades have you done with GVL until now?

I think it would be good to plan a pathway to make upgrades possible, even if we don't implement upgrades in the first iteration, or we risk locking people into specific versions.

Here is the plane to upgrade an image: https://github.com/bgruening/docker-galaxy-stable#upgrading-images--toc, its all documented. You can make this automatically, I'm not sure you want to. Its important data and you do this once in 3 years or so.

No sqlite. Everything is in postgresql. And this are just entries in a DB, some rows. Maybe a few thousands. This is imho neglectable in size.

I'm probably missing something here, is the postgres database bundled with the Galaxy container? If it's separate, do you have a separate build of postgres for each flavour?

Its separate https://github.com/bgruening/docker-galaxy-stable/tree/master/compose/galaxy-proftpd You can have a separate build for each flavor yes. You can also tar the shared volume if you like this more.

The container size is only large if you include "tool-dependencies". This is not a must anymore.

Ok, in that case, I guess we can start off with this minimal approach, where dependencies are pulled in at runtime, and then transition to having a globally shared CVMFS with dependencies. Sound ok?

Jupp!

pre-populated databases that would generalise well to any database connected app, ?

I was thinking of any app with a database, say lovd: http://www.lovd.nl/3.0/home If we preload some data into a database, we would either need to build a separate database container for each app, or we would need to be able connect the data to the container at runtime. For the latter, a tarball of the database is one approach, which is extracted at runtime onto a volume, and mapped to a container folder. Loading data at container startup is another. Are there any other general patterns/ideas for handling this?

I can only think and these both. Ship a tarball of the pg database or do and sql import at the beginning.

afgane commented 6 years ago

You guys have really taken it into depths so I'll just try to summarize for anyone coming late.

On the topic of bundles and flavors wrt the dependencies, I feel the CVMFS path will represent a solution for any internet-connected Galaxy so I think that's the most desirable solution we should put effort toward. There won't be any initial download penalty in that case either. We'll still have a question of what tools are added/visible in the tool panel to accommodate different use cases so that'll probably require some changes in Galaxy. Until that gets sorted out though, it sounds like runtime dependency resolution should work to pull in the binaries from Conda so all we need to install into the images now should be the tool definitions, which will help in reducing the image size. As long as we have a small base image to build from, it sounds like creating derived images to represent the flavors has been successful from the build and usage standpoints.

The database migration/update remains an open issue. Even though it doesn't happen very often in the lifespan of an instance, there are people that have started their instances 5+ years ago and would like to keep it running. Given we're working on making this a solution that works for non-technical user also, it's just important here that we can offer a solution in the future that allows for seamless upgrades rather than definitively locking them in. I would think that largely means that we run dedicated database containers so that an out-of-band process can be created for the update while leaving the rest of the environment intact/independent.

Re. configuration, it seems ENV vars is the right path forward so whatever is not configurable via ENV vars should be changed in the upstream code. We can then run a K/V store + confd to update state. For the initial iteration, invocation command/helm spec can be used to declare the values.

How does this look visually? Each box represents a container, grouped by stages based on the discussion here. image

nuwang commented 6 years ago

I think this summarises things really well. We probably have enough content to start merging the helm chart with the common docker container. I think there are some possible solutions to the database upgrade issue too.

But this does not exist yet and it solves a problem which we don't have at the moment. @nuwang is this a problem you where referring to? I understood that it is pure disc-space concern?

The concern was more about how the initial data population was done. I tried running the compose setup, and I have a better understanding of how things work, but it still isn't a 100% clear how/why things are working the way they are.

It looks like the /export directory on the host is being populated by the contents of the docker image, which means that we can propagate default data from the container to the host. However, I don't really understand why this works, because the docs seem to suggest that this should only happen for volume mounts, not bind mounts: https://docs.docker.com/engine/admin/volumes/bind-mounts/ and you are using a bind mount correct? In fact, I even tried modifying a single file in /export on the host, and deleting everything else. Docker restored all files from the container except the modified one. This is excellent, but I must be missing something since the docs state that only empty volumes will be populated this way.

Also, it still doesn't quite answer how pre-populated data in the database will be handled. I was under the impression that any tools installed from the tool-shed will be stored in the tool database - i.e Postgres. If the database is empty, the tools will not function, is that correct?

If so, since the database and galaxy are being built separately, who is installing tool data into the database and how will we ensure that, when we recreate the compose setup elsewhere, the pre-installed data is restored without a build from scratch? The same question arises if we want pre-populated data like a workflow.

I also noticed that you have only mentioned a procedure for building the containers and then running it. Can't we just run it from pre-built containers, without having to build it? If so, can we provide a pre-populated database with workflows, histories etc?

Here is the plane to upgrade an image: https://github.com/bgruening/docker-galaxy-stable#upgrading-images--toc,

This is super useful. I think that helm should be able to transparently handle the upgrade process, including taking a backup of the database, upgrading it, and rolling everything back if things go south: https://docs.helm.sh/developing_charts/#hooks

abhi2cool commented 6 years ago

here is an attempt to replicate galaxy docker-swarm/compose implementation on kubernetes k8s cluster https://github.com/galaxyproject/galaxy-kubernetes/tree/v02-1/v02

pcm32 commented 6 years ago

I'm pleased to mention that I have a first working version of helm charts with compose containers. Still some bits to go, but mostly there. I'm managing to avoid completely the config file, but I think that, at least for my use case, avoiding to inject the job_conf file will be very difficult since dynamic destinations for resource usage limit are needed.

rc-ms commented 6 years ago

Hello there. Jumping in as part of work to get galaxy-kubernetes working on Azure. We ("we" being @abhi2cool ) are having an issue scaling HTCondor jobs, getting an error 'Job has not been considered by the matchmaker'. Which is an interesting message, to be sure. Do you have any suggestions about how we might interrogate/debug such an issue? ( @bgruening you were suggested as being wise in the ways of these things).

Thank you!

pcm32 commented 6 years ago

@rc-ms would it serve your final purpose to simply use the job dispatching/scheduling of Kubernetes instead of using condor on top of Kubernetes?

pcm32 commented 6 years ago

After a few more days of work on this, I have examples of usage for general Galaxy deployments and for our PhenoMeNal deployment with compose based images are available here. The non-phenomenal one is failing due to this issue, the PhenoMeNal one works fine (because I inject our job_config as part of making my derived ini container). I would say though that until this PR on the Galaxy side is not done, I wouldn't use this yet for heavy analysis loads as clusters can get chocked.

rc-ms commented 6 years ago

Thank you @pcm32 I think it would help, since HTCondor is what is blocking us right now. Oh, and filesystems :). Let me check with team and get back.

bgruening commented 6 years ago

@rc-ms can you give me more information how you run the containers and how you submit jobs. Our travis testing can run Condor jobs currently. Any way how I can reproduce this would be fantastic!

pcm32 commented 6 years ago

@rc-ms: 1.- do you have your own flavour of galaxy-init container with the tools wrappers that you want there? 2.- Would all the tools that you need to use be mappable to containers automatically by galaxy? 3.- are you provisioning a posix shared file system that is accessible to your Kubernetes cluster within Azure?

if we need to discuss specific aspects of your deployment, probably best to email me directly (find my email here)

pcm32 commented 6 years ago

@rc-ms maybe open a new issue for your use case of Galaxy with k8s, this issue is being over used for too many parallel discussion.

rc-ms commented 6 years ago

will do @pcm32 . @abhi2cool will you start the thread? @bgruening Abhik will share his configuration and issues there.

rc-ms commented 6 years ago

Hello @pcm32 and @bgruening created new issue #4 to discuss our configuration / operational issues. @abhi2cool will upload logs and configuration info forthwith. thanks!

pcm32 commented 6 years ago

I think that this has been working for a while as well, so I will close this. Let me know if you lack documentation (and where) to make it work locally.