Explore options for deploying CKAN on Kubernetes

davidread commented 7 years ago

Current Status (last updated June 2020):

People including core maintainers such as @datopian have had CKAN running on Kubernetes for several years
Public set of helm charts are here https://github.com/datopian/ckan-cloud-helm
Worth discussing if we create an "official" CKAN Kubernetes setup

Original

Let's start using Kubernetes to deploy CKAN.

Here's the dream: someone creates a Helm chart (package) for CKAN. Now a single command deploys containers to a cluster, including CKAN, CKAN DataStore, SOLR and Postgres, all configured to talk to each other.

This is how I see it:

'Orchestration' for containers abstracts away lots of ops tasks, automating provisioning machines, deploying containers, scaling etc.
Kubernetes has emerged as the leader in container orchestration in the past 12 months. It's been embraced by the open source community and is compatible with all the main cloud providers.
Running CKAN in a Docker container has really been pushed on by @florianm recently and got review and momentum now across the community, to become an official way to install CKAN.

I'd encourage people to share what they are doing in this area and hopefully we can work together on the successes.

mbocevski commented 7 years ago

We run hundreds of CKAN instances in Kubernetes for over a year and a half now. Our setup includes deploying all components in Kubernetes including an HA Postgres setup which is shared across all instances. Our setup runs Deis in order to streamline deployment and management of applications using a simple git workflow.

The full setup that we run includes: HA (master-slave) postgres, minio (as object store for CKAN filestore), traefik (for load balancing and automatic letsecnrypt certificate management), Zookeeper multi-master cluster, Solr Cloud cluster, Deis. We run our own docker images of CKAN and CKAN Datapusher in a stateless way so we can scale.

All of the components are deployed to kubernetes and we run a CEPH cluster for providing persistent volumes in kubernetes. We use ansible to automate the deployment of this entire setup end-to-end, and we have a custom tool built for easily deploying CKAN portals: create database, enable/disable postgis, create datastore, create solr collections, use deis to deploy CKAN portal with easily enabling/disabling extensions.

Of course the above setup is for large scale deployments of CKAN, where one would deploy lots of CKAN portals side-by-side.

It would be straight-forward to build a helm chart as you describe and we can have someone on our end create that. Probably without high-availability and clustering of components. The only thing to mind is the persistent components which can be by default set to emptyDir and configurable with the helm chart. If this is a good approach we can get this out quite quickly. Comments?

Vanuan commented 7 years ago

What's wrong with compose files?

mbocevski commented 7 years ago

@Vanuan there is nothing wrong with compose files, they are great for deploying to docker Swarm and as Docker announced yesterday you will also be able to use compose files with their enterprise offering Docker EE, which will support kubernetes.

This is for providing a way to those who run/want to run kubernetes, a way to deploy CKAN easily on a kubernetes cluster, because you can't use compose files to deploy to kubernetes.

davidread commented 7 years ago

It would be good for this to appeal to the broadest set of people, hence Kubernetes over other orchestrations.

@mbocevski thanks for bringing your experience in on this and offering to do a chart.

I've played about a bit with Helm charts and see that as useful for configuring the containers to talk to each other, but it would be good to get other views - maybe Helm is a bit niche and there are other better ways? Or would you and others suggest that a chart is going to be useful for the average person deploying to Kubernetes?

mbocevski commented 7 years ago

Helm was developed by the Deis team and it's a pretty decent implementation of a package manager for Kubernetes, in fact the kubernetes project accepted it as the default package manager and made it part of Kubernetes. For the average person deploying CKAN to Kubernetes, I would recommend helm due to the fact that it's simple to use and we can have specific documentation about CKAN on Kubernetes, because it would make sense that we use k8s things like configMap for the CKAN configuration and similar.

So in summary a configurable Helm chart with additions to the CKAN docs about running CKAN on kubernetes will be the simplest and smooth experience.

davidread commented 7 years ago

That's great. What's not clear to me is how someone can use a chart to install & configure CKAN extensions, but we can see.

@mbocevski I'd love to take you up on your offer - if you're willing to contribute a chart for us to play with, that'd be great to get us all going, for everyone's benefit.

mbocevski commented 7 years ago

Installing and configuring CKAN extensions will not be straightforward, because the best would be that they would be in the image. Our case is that we always create a Dockerfile which starts FROM a base CKAN image and then just install extensions there. I can give an example and also try to think whether we can do something better.

I'll create a chart and contribute it, also write up a way to easily get it up and going with minikube.

davidread commented 7 years ago

Awesome!!

Vanuan commented 7 years ago

I think the title is misleading. If it's specific to Helm, it should be named "Build Helm CKAN package". If it's generally about deploying CKAN and using Kubernetes orchestrator then there are at least three ways to do that.

you can't use compose files to deploy to kubernetes

Can't you? What about this: https://github.com/kubernetes/kompose ?

you will also be able to use compose files with their enterprise offering Docker EE, which will support kubernetes.

AFAIK, that's not specific to enterprise. The goal here is to have the same developer experience while benefiting from k8s production robustness. So both Docker CE and EE will have k8s OOtB.

Vanuan commented 7 years ago

So I think the "Kubernetes" term needs clarification here. What is kubernetes for you (as a developer or an ops guy)?

a developer/op facing deploy format (Helm, ConfigMap)
a set of tools (kubectl, minikube)
an orchestrator (the way your containers/images are deployed to nodes, better resource utilization, technical superiority)

If it's the last, then using a tool to convert between docker/rkt/mesos/k8s/etc deployment formats would be viable. If it's the first, we as developers will be forced to continue maintaining mutliple configurations files and use cases.

Vanuan commented 7 years ago

It's just struck to me that maybe for you Kubernetes is a Web UI. Well, in that case recommending Docker CE makes no sense. And we'll be doomed to have discrepancy between development and production.

mbocevski commented 7 years ago

@Vanuan I don't appreciate your behaviour, I think it's completely inappropriate. I would encourage you to be respectful and encourage you to go through and read the CKAN Code of Conduct.

What about this: https://github.com/kubernetes/kompose ?

Kompose translates docker-compose.yml files into kubernetes resources, thus locking in to only the types that are supported in docker compose. Personally I think tools like this are helpful but not appropriate for maintenance.

So both Docker CE and EE will have k8s OOtB.

Docker CE for Mac and Windows will have k8s OOtB i.e. basically it will spawn a vanilla kubernetes that is tightly integrated with docker. On Linux though users would have to instantiate a kubernetes by themselves and likely some additional config to docker to reach the same OOtB experience as on Mac and Windows.

we as developers will be forced to continue maintaining mutliple configurations files and use cases.

Not if implemented properly, thus package managers come in handy. I don't see a big overhead to maintenance if the helm package is built correctly, integrated with the development flow and has helpers to extract all the relevant configuration in an automated way.

A lot of open source software is packaged in different flavours in order to support all the different distros and platforms. For those that want to deploy CKAN to Kubernetes, I think that a helm chart will give the smoothest experience from any perspective. If you think that there is a better alternative, please suggest and contribute so that the community can look into and collaboratively decide on which approach is best.

amercader commented 7 years ago

Thanks all for your inputs and thanks of course @mbocevski for your offer.

@Vanuan Just to be clear and please correct me @davidread if I didn't catch your original intention, this is about to explore a generic k8s based option that can work as a basis for the community to deploy their CKAN instances. We are not at this stage looking at making it an official way of deploying CKAN, this will only happen if we can ensure it works for most people and can be supported. But different people deploy CKAN in many different ways and we want to help people share their experiences and discuss the best approaches, specially with an emerging player with k8s.

Vanuan commented 7 years ago

Thanks. I wanted to clarify that. If it's a generic k8s option there's more than one way to do it. I felt it to be prejudicial to reject Kompose straight away. But if the intention of this issue was to provide Helm package I beg to rename.

I wanted this issue to be broader, and abstract away the k8s specifics. If it's not the right place to discuss this, I can open another issue.

davidread commented 7 years ago

I'm new to all this, so can you help me understand your proposals? I'll try and summarize but will probably get this wrong, so please correct me:

@Vanuan perhaps you are suggesting we collaborate on a compose file, because that is native to deploying to Docker Swarm/CE/EE and can be converted using kompose to work sufficiently well with a Kubernetes cluster?

@mbocevski you're saying that a Helm chart is native to Kubernetes and also deploys to Docker CE/EE. But if someone chose Docker Swarm, the chart would offer them some value as a basis for conversion? (i.e. is that what you mean by "helpers to extract all the relevant configuration in an automated way")

How important is it to support Docker Swarm as well as a Kubernetes cluster in the long term? Since this is a 'lively' topic right now, let's ensure answers to this are thoughtful professional opinions.

gerbyzation commented 7 years ago

From my brief experience with Kompose it's a tool to give you a starting point to move from Compose to Kubernetes. When I used it on a project, it didn't provide the kubernetes configs that I could straight away deploy. If this is from a lack of insight on my side please correct me, but the first 2 paragraphs of the kompose repo seem to confirm that it's a tool to use when migrating, not necessarily to produce a complete & exact manifest, or for production use for that matter (note the 2nd paragraph):

kompose is a tool to help users who are familiar with docker-compose move to Kubernetes. kompose takes a Docker Compose file and translates it into Kubernetes resources.

kompose is a convenience tool to go from local Docker development to managing your application with Kubernetes. Transformation of the Docker Compose format to Kubernetes resources manifest may not be exact, but it helps tremendously when first deploying an application on Kubernetes.

I think a Helm chart would be a great addition, giving an easy deployment route for CKAN that's scalable, and a stepping stone for people interested in creating a more advanced k8s setup.

Vanuan commented 7 years ago

@davidread Yeap, I think that would be great. Container technology is still relatively new. I think some interoperable format is inevitable. And since you can't convert Help/ConfigMaps to compose, but it's possible vice versa, I think compose could be a great sharing point.

According to the documentation one could use compose v3 files to deploy to kubernetes without intermediate conversion step: https://github.com/kubernetes/kompose/blob/master/docs/user-guide.md#kompose-up

Helm chart output is also supported with kompose convert -c.

Here's a compose file I use: https://github.com/Vanuan/ckan-base

Side notes:

There are the following issues I found that are not specific to swarm or kubernetes but shared:

Persistence. There's no agreement on how containers should persist data. The closest thing I found is CSI which will make volume plugins interoperable. Somebody will use NFS + bind mount, somebody will use CEPH cluster, somebody will use volume plugins like Portworx, Infinit.sh, Longhorn or something entirely different. AFAIK, there's no way to abstract away those differences yet.
Customization. It looks like that the only way to customize is to store a modified ckan.ini somewhere in persistent volume.
Plugins and theming. Installing plugins in microservice environment is treaky. The simplest way is to dump everything imaginable into a single image and enable it through settings. That's the way suggested in udata. But there are other approaches possible, like downloading resources in runtime from some theme/plugin store.
Initialization. It looks like first admin user should be created manually. There should be some automated way to do that.

Analect commented 7 years ago

@mbocevski ... thanks for your suggestion to add a helm chart for CKAN based on your experiences on working with CKAN in kubernetes.

You mentioned above:

Installing and configuring CKAN extensions will not be straightforward, because the best would be that they would be in the image. Our case is that we always create a Dockerfile which starts FROM a base CKAN image and then just install extensions there. I can give an example and also try to think whether we can do something better.

It could be that you could use a series of init-containers to install a specified list of extensions. This example of a jenkins-config that I came across, used to install jenkins plugins, is maybe an approach worth emulating ... where from a list of extensions to install, an init-container for each could clone the repo for that extension and install it .. with added extensions incrimentally mounted to the end CKAN installation. Just a thought.

When you have something preliminary up, could you point us there from here ... and maybe I can pitch-in in some way. Thanks.

Analect commented 7 years ago

@mbocevski ... has there been any progress on this?

jqnatividad commented 6 years ago

Hi folks! As 2018 is just around the corner, was wondering how "we can work together" as per @davidread's original post at the top of the thread.

We'd also like to actively contribute to the effort (cc @jhinds) and collaborate with the group in a general Containerization initiative.

On the issue of configurability and CKAN extensions - when I originally created the CKAN Discourse extension, I was intrigued by how they pulled Discourse plugins from github and created a "Configurator" interface for plugin settings. The group may want to look into that technique when considering Plugins/Customization per @mbocevski's and @Vanuan's posts.

Vanuan commented 6 years ago

@jqnatividad so the technique is similar to udata's: pull every plugin into a single container and enable only those you need.

Vanuan commented 6 years ago

I was thinking of some kind of RPC. For example, datapusher requires both plugin and a service. Maybe it's possible to externalize the plugin part too.

philtweir commented 6 years ago

Our team will be working on a basic CKAN Helm chart (public on Github) over the next couple weeks - while we'll be focused on a basic use-case, it seems logical to try and work in (especially if the community is going down that route). Will keep an eye on this issue and keen to be dragged into discussions :)

jhinds commented 6 years ago

Hi All,

I’d like to chime in since we have an interest in this discussion and where it is headed. I wanted to share our experiences and hope to bounce some ideas of one another.

We are in the process of migrating our ckan instances to kubernetes and and have been leveraging helm for deployments. The chart includes ckan, postgres, redis, solr, and datapusher. We will likely move postgres and redis off of k8s for certain environments so we have flags in the template to not deploy those containers when we don’t want to.

In regards to extensions we are doing something similar to @mbocevski where we have a base ckan image, an intermediate docker image that has some basic extensions, and a final docker image with specific extensions installed for that portal. With helm we just template out the image we are using and pass in what specific image we want at deploy time. I've found this approach works well but we are still in flux on where we stand in regards to installing some plugins at runtime for flexibility or keeping it like above for immutability purposes. I'm trying to avoid having images with too many extensions that aren’t needed and won’t be active if we don’t have to in order to keep images small(ish). We're till trying to get to that sweet spot.

For configuration we have our .ini files as ConfigMaps and they are stored as jinja2 templates that ansible will populate with values, secrets, and potentially a list of plugins before creating the ConfigMap in k8s. Deactivating an extension is as easy as rerunning the script. New extensions and upgrades will require a new image build and deployment.

We try to keep everything that we can stateless as possible and leverage cloud storage for things that we can and depending on the component we use one of the recommended Persistent Volume options. Additionally I’ve recently had to write up some related docs and an internal tutorial for deploying ckan to k8s via helm so if there is a need for some documentation wherever this goes I’m happy to assist on that front as well.

davidread commented 6 years ago

@jhinds this all sounds great! Please do share this work so that we can build on it as a community.

amercader commented 6 years ago

@jqnatividad thanks for rekindling this conversation! @jhinds thanks for the thorough update, this looks very exciting and hopefully with some input from @mbocevski and @philtweir other approaches we can come up with a generic recommendation.

This is only tangentially related but I'm working on a Docker Compose setup that will install "local" extensions on a host folder so you can develop on them with the development server. Maybe there is something there that can be reused for the runtime-enabling of extensions.

waterponey commented 6 years ago

Hi, I see there is a lot of talk about creating a ckan helm chart. I'm also interested, is there someone currently working on that ? I'm not a huge helm pro but I'd be glad to help the effort since I need that anyway. @philtweir or @jhinds do you have a public repo somewhere ?

gerbyzation commented 6 years ago

For The Dataplace I've been working on a CKAN chart to be able to easily create new CKAN instances on kubernetes, mostly based off the images from https://github.com/okfn/docker-ckan. I've asked and they're alright to publish what we have so far. I will try to find some time next week to put it in a public repo.

philtweir commented 6 years ago

We've been held up by a few other issues, but have been working on this over the last week. Have a rough set of scripts getting us from cloning fresh CKAN to a plugins-included Helm deployment, alongside our application, but it's still got issues - will update here as we progress, but keen to join in if there's a more stable one good to go :)

gerbyzation commented 6 years ago

Just a quick update, didn't manage to get it ready last week, but I'm still actively working on this. There's a few things specific to our setup that I need to pull out of the chart, once that is done I'll be able to put it up.

OriHoch commented 6 years ago

hey everyone, I have a ckan kubernetes environment here - https://github.com/OriHoch/data4dappl/tree/master/k8s been running it on production for a while now, serving https://www.odata.org.il/

davidread commented 6 years ago

@OriHoch thanks for sharing that. It's a start and lots of useful examples. Hopefully future iterations or shares from other people will be templated into a chart. And no doubt everyone's ambition is to not to be limited to running on one node.

philtweir commented 6 years ago

With apologies for the delay, we had been focusing on some docker-compose work, but have a basic chart here: https://github.com/lintol/helm-ckan . This does need some work still, but would be helpful to have feedback. The approach to the plugin problem is an initial build for your own CKAN, which gets pushed to a public/private repo, and variable/manual modifications to the production.ini template. There is no model here for filesystem persistence (so no Ceph, etc.), only via stateful services (e.g. PG), but should allow for independent CKAN process scaling. Anyhow, it's a step!

davidread commented 6 years ago

@philtweir nice one. Let's all give it a try and talk about how we can improve the rough edges!

philtweir commented 6 years ago

Brief update, I have got a script to run paster commands (e.g. ./paster.sh --plugin=valid...) against the installed chart by creating new jobs based on the retrieved post-install job's YAML, but if anyone has another suggestion, open to ideas!

OriHoch commented 6 years ago

hey everyone, the odata.org.il chart has been improved significantly lately, check it out - https://github.com/hasadna/hasadna-k8s/tree/master/charts-external/odata

philtweir commented 5 years ago

The odata chart looks great - full featured and really helpful to see how that all fits together. Our current chart is pretty basic, which suits our particular need - after some thought, it still seems logical for us to polish the lintol/helm-ckan repo to have as a concise, generic, barebones CKAN deploy (with some simple plugin support).

We will likely use that repo as a common base for forking and expanding to more complex derived charts, as we tend to link CKAN with other complementary services in different settings (such as Lintol). If that still sounds useful for others, keen to take feedback on improvement priorities via the issues there. Nonetheless, I would recommend anyone looking for a more end-to-end approach - or setting up a scalable public sector open data portal - to start with OriHoch's excellent work above.

OriHoch commented 5 years ago

thanks a lot @philtweir

the latest version of my work is now here -

philtweir commented 5 years ago

Thanks @OriHoch - will take a look through!

rufuspollock commented 4 years ago

We at @datopian (@videruminc) have a full Kubernetes set up for CKAN - including deployment to multiple clouds. You can find the helm charts here https://github.com/datopian/ckan-cloud-helm

Happy to work on more detailed documentation and seeing if we can get something "official" if people are interested.

ckan / ideas

Explore options for deploying CKAN on Kubernetes #206

Original