AtlasOfLivingAustralia / ala-install

Ansible playbooks for installing the ALA components
https://www.ala.org.au
Apache License 2.0
26 stars 52 forks source link

Module containerization and kubernetes deployment #306

Open andrevtg opened 4 years ago

andrevtg commented 4 years ago

docs on this effort at https://vertigoala.github.io/docs/

We have gone a different path in Brazil regarding ALA setup. Inspired on Sweden containerized approach we have created Docker images for the modules we use and Helm charts to deploy them against Kubernetes. We have found that this leads to a more predictable result compared to VM-based ansible setup. Actually this could greatly reduce problem surface and make the maintainers' lives a lot easier.

Our current work is very specific to our installation, but we would like to gradually make it generic and merge with ALA upstream. Having an official docker/helm chart release could make any ALA instance behave the same regarding distribution, upgrading, load balancing, failover, monitoring and observability.

In order for such move to have zero impact on current setups and preserve current behaviour untouched we'd like to propose:

Our current team is able to contribute to this work (not sure for how long), but the benefits we have experienced here are worth the shot.

Is this proposal gets accepted we'd like to know how to proceed.

Cheers.

ansell commented 4 years ago

@vjrj is the Technical Coordinator for the Living Atlases, and I would like to get his feedback on this.

I personally like this idea and the ALA, including @nickdos , have brought it up internally in the past with a view to pushing it through gradually for our applications as soon as practical.

The ALA currently use Zabbix and ELK for monitoring system statistics and log files, respectively. However, Prometheus is also very popular and maintained so I have no objections to adding scripts to set that up as part of this project.

vjrj commented 4 years ago

Thanks indeed @andrevtg for this proposal and @ansell for /cc me.

We have talked the past week about this here: https://atlaslivingaustralia.slack.com/archives/CCTFGEU1G/p1562095522086500 (sorry for my response extension).

I don't have an "inventory" of current nodes using docker right now (more than Sweden and Brazil) but I hope that we can converge in some point and this and other similar efforts are in benefit of all of us (including ALA).

As I mentioned in the thread I'm thinking in an additional step totally complementary that can help to this docker installing method but also to others like ansible.

I think that currently ala-install are doing some tasks that are responsibility of a packaging system like debian packaging system (get some war, remove the previous one, copy in that location, configure it, chown/mod, etc). And I can imagine that this "install" logic is duplicated some how in the different dockerfiles of each country portal (copy this war, blah, blah, blah).

And you know what happens when you have duplicate code (in this case install code): that you solve some bug in some part (for instance some perms issue or some new dependency in ala-install), and the bug is still present in the duplicated code (in this case, the different Dockerfiles). We are in this situation right now, and would be nice to avoid it.

So this ecosystem is quite difficult to maintain healthy IMHO.

Just to give and example of this, look in this install procedure of CAS2 based in a mixture of ala-install, bioatlas.se code, documented by .ca: https://github.com/AtlasOfLivingAustralia/documentation/wiki/ALA-AUTH-with-CAS-2 Laugh Now (installing CAS2), Cry Later (maintaining this)

So the additional step that I want to propose it's to move this "install" logic code to a debian packaging system or similar.

In other words, we install tomcat7, mysql, mongo, postgresql, etc, etc, via debs packages and all of us know the benefits of that, and my proposal is to do something similar with ALA software, so docker and ansible use the same packages and "install code" and we avoid this "duplicated install code".

I did this in the past (to package some war in a debian non-official package) and to generate docker images using that package and I cannot be more satisfied of the result.

Inconveniences:

I was talking about this the past weeks with @djtfmartin informally and I proposed to make some ALA deb package as a sample and discuss the benefits and inconveniences of this approach with the rest of ALA developers presenting a deb package as a sample. This is totally complementary IMHO to this docker proposal.

cc @timrobertson100 @shahmanash to follow the thread discussion.

andrevtg commented 4 years ago

This is a great discussion.

A docker image can be considered itself a reusable package in itself. It may actually be a lot easier to create an official ALA module image than to create a deb package, and the result is a full runtime environment - meaning more predictable results wherever it is used. The art is to produce configurable images relying on env vars, secrets, configs and smarter start scripts.

My point is maybe we should accept the risk of having two install paths (current ansible playbooks and a dockerized one) in order to have the docker/helm progress faster as a proof of concept and to explore quickly some possibilities. We may pick a simple module here and publish a proposal in a public repo so we can discuss over something more tangible, what do you people think?

Motivations:

vjrj commented 4 years ago

Yes, sounds to me like a typical old emacs/vim of gnome/kde etc discussion... :-)

But, let me pick some example of an official docker image: https://github.com/docker-library/mysql/blob/master/8.0/Dockerfile just to see that is based in debian repositories and debs packages. The Dockerfile is quite minimal and quite maintainable and just have deb/apt info and docker related info [1].

So I install mysql (or maria-db) packages using docker or related when I need it, or the plain deb in LXC containers, or in KVM, or vagrant, or in my bare metal directly, or in my desktop [2], etc, etc.

I'm not an experienced Docker developer, but, permit me to make some joke:

Dockerfiles \ && development \ && was \ && a \ && pain \ && for \ && me

:-) so to minimize this code sounds logic to me.

I forget to send before some simple link about what involves to duplicate [install] code: https://en.wikipedia.org/wiki/Duplicate_code and we are suffering this already in our LA community and this scare me.

I don't suggest to debianize ALA now, it can be done progressively if we think that is an interesting solution. Other install methods (ansible, docker.se, docker.br) can converge to use them when available to avoid bugs and duplicate code.

[1] nginx seems do something similar: https://github.com/nginxinc/docker-nginx/blob/master/stable/buster/Dockerfile [2] Some years ago I used: https://blog.jessfraz.com/post/docker-containers-on-the-desktop/ but not now.

andrevtg commented 4 years ago

Lol, I hate that "&& \" thing too, maybe they'll come up with checkpoints or something like that to sanitize layer definition.

I do like the notion of deb packages for ALA modules a lot, but if they were already available there would be still great benefit from creating docker images and helm charts. We should also notice that there will always be a different upgrade path with both approaches, though: one does not upgrade nginx within a container, you just destroy it and create a new container from a new image instead (mounting the same volumes, configs and secrets). Any upgrading logic bundled in a deb package is never triggered - so, yes, a different install path is inherent to this deployment model. Gitlab's official images are a fine example.

I don't have the resources/skill to help with deb packaging, but I do have a window of opportunity to help with a proposal for reusable/generic docker images and helm charts. I'll keep these things in separate repos then, but I may keep using this thread to further discussions if you all are ok with that.

vjrj commented 4 years ago

To upgrade of a docker image generated using a deb package you only says, "now use that version of package". And the deb package + the software packaged itself has the responsibility of the data migration, etc. And yes, you destroy and create/run etc your new image.

So the docker development and maintenance it reduces to maintain the base image (Ubuntu whatever), the repo of the software where the package resides (nginx inc whatever) and the debian package version to use (whatever version). So the docker development it's minimized and the risks and bugs.

From my little knowledge of the build process of the Swedish images, they use makefiles (that I imagine that generate based int the ala-install tasks).

Debian packages main file from my point of view (the rules file), is also a kind of makefile where you says, "copy this here, put some perms here", etc. You have your init scripts, your post install scripts (reload nginx service, for instance).

My "vaporware" proposal is: lets create some debs package in the future, little by little, so the different docker images and other install methods benefit from then. In the manner, we have a single point of install logic that relays in a robust packaging system.

I'm trying to think in different countries and different and common necessities, different resources, knowledge, infrastructure, history, etc.

By the way, this packaging system that condenses the knowledge of many free software developers since the 90s, is quite difficult to imitate via Dockerfiles or ansible tasks IMHO.

We can explore other more modern packaging systems (snap? [1]) but I don't have experience or opinion but they have some criticism.

[1] https://en.wikipedia.org/wiki/Snappy_(package_manager)

Update: cc @shahmanash

andrevtg commented 4 years ago

You are totally right on the maturity of deb packaging vs Dockerfiles, I couldn't agree more: Dockerfiles based on deb packages would be best-of-breed for sure. The problem is that the package installation typically occurs at Dockerfile build time - volumes and therefore data migration are not available or even possible at this moment. There is no database, no Solr, nothing is online - you are alone in a CI/CD pipeline.

A Docker image is an already installed, ready-to-run software - deb package is long gone when real data is mounted in volumes at container creation. Even with deb packaging at build time a different approach may eventually be required. Maybe @shahmanash could shed some light here.

If, on the other hand, we consider the typical changes we had to do in the apps themselves when creating our house-made images (patching JS files, console logging, defining entrypoints, defining vars/arguments to make image reusable across environments/domains/countries etc.), it becomes somewhat clearer that the deb package can only cover a small subset of the problem surface. Regardless of how a module is installed along the Dockerfile, making images reusable with external configs is an entirely different effort and craft.

But, please, I do not want to push this to the point of inconvenience. I'll set a separate repo with a working example and keep the discussion there.

Thanks!

vjrj commented 4 years ago

To discuss this is not an inconvenience at all for me.

Let me explain better about data migrations, etc. from my experience using docker, debs, etc.

The responsibility of data migrations are minimal in a deb package (the only take care of installing the software including, for instance, database migrations scripts, etc). Many times the packaged software takes care, on run time of data migrations (via things like liquibase, flybase, internal procedures or whatever).

A docker image is not only a Dockerfile that builds the image, consist also in start-up scripts, process supervirsors, etc.

I see that our current ala-install have some logic of migrations, db initialization, etc. that you have to duplicate in your images somehow. See references to ALA db squemas in .se repo: https://github.com/AtlasOfLivingAustralia/documentation/wiki/ALA-AUTH-with-CAS-2 in other cases, the ALA software itself takes cares about db initialization and migration.

Debs packages are not a static thing, they can be preseeded for automatic build, customizations without manual intervention, etc: https://wiki.debian.org/DebianInstaller/Preseed

About to use a separate repository, etc, in general I try to send small PRs, easily to review, accept, maintain, etc. and when my proposal is something bigger, difficult to accept, experimental, or some sample, proof of concept, etc, I tend to use an additional repositories, gists/snippets, or similar so the upstream project and developers can review, tests, etc, and eventually, if the proposal its accepted, popular or whatever, I can transfer the ownership to upstream of that repository, or the can copy/paste the gists snippet, etc.

Among other things, I tend to think in terms of maintenance by upstream of my proposal. Also for accepting PRs from others (we can maintain this in the future?).

And yes, will we nice to read other opinions and points of view. Thanks!

andrevtg commented 4 years ago

Ok, I have set a few repos that anyone can clone, build and test, both locally or in play-with-docker. No kubernetes helm charts yet, just docker-compose. Assuming you have Docker Desktop these are the things you can do, repo by repo:

IN CASE YOU WANT TO BUILD IT

A good order to build/test repos:

  1. commonui-sample (custom commonui)
  2. image-service (reusable image service image)
  3. image-sample (example of image-service reuse in an imaginary custom ALA deployment)

IN CASE YOU JUST WANT TO RUN IT

docker stack deploy -c docker-compose-swarm.yml image-service
vjrj commented 4 years ago

Thanks @andrevtg for this, I'll play with it ASAP.

vjrj commented 4 years ago

A quick look and I really like that image-service has only some commits of difference from upstream and later you have the image-sample that uses it. A debian package is something similar (it's just a new directory).

andrevtg commented 4 years ago

Docs on https://vertigoala.github.io/docs/

andrevtg commented 4 years ago

A quick look and I really like that image-service has only some commits of difference from upstream and later you have the image-sample that uses it. A debian package is something similar (it's just a new directory).

Yes, I have just added some files (Dockerfile, docker-compose.yml) to support a dockerized workflow and to apply a consistent default behaviour for the proposed image. Original files remain untouched.

vjrj commented 4 years ago

I played a little bit more with the different docker images and in general I like a lot your proposal @andrevtg how is generalized and can be used by other nodes, the changes to each ALA repository, etc.

I only have minor and general comments about how to minimize the duplication of code, and how to facilitate the maintenance.

When I talk about duplicated code, I mean, for instance: 1) properties/yaml files and 2) db schemas like here 3) and general install/deploy procedures (like in Dockerfiles)

Which strategies we can use to avoid this?

Good work, thank you!