Open davidread opened 7 years ago
We run hundreds of CKAN instances in Kubernetes for over a year and a half now. Our setup includes deploying all components in Kubernetes including an HA Postgres setup which is shared across all instances. Our setup runs Deis in order to streamline deployment and management of applications using a simple git workflow.
The full setup that we run includes: HA (master-slave) postgres, minio (as object store for CKAN filestore), traefik (for load balancing and automatic letsecnrypt certificate management), Zookeeper multi-master cluster, Solr Cloud cluster, Deis. We run our own docker images of CKAN and CKAN Datapusher in a stateless way so we can scale.
All of the components are deployed to kubernetes and we run a CEPH cluster for providing persistent volumes in kubernetes. We use ansible to automate the deployment of this entire setup end-to-end, and we have a custom tool built for easily deploying CKAN portals: create database, enable/disable postgis, create datastore, create solr collections, use deis to deploy CKAN portal with easily enabling/disabling extensions.
Of course the above setup is for large scale deployments of CKAN, where one would deploy lots of CKAN portals side-by-side.
It would be straight-forward to build a helm chart as you describe and we can have someone on our end create that. Probably without high-availability and clustering of components. The only thing to mind is the persistent components which can be by default set to emptyDir and configurable with the helm chart. If this is a good approach we can get this out quite quickly. Comments?
What's wrong with compose files?
@Vanuan there is nothing wrong with compose files, they are great for deploying to docker Swarm and as Docker announced yesterday you will also be able to use compose files with their enterprise offering Docker EE, which will support kubernetes.
This is for providing a way to those who run/want to run kubernetes, a way to deploy CKAN easily on a kubernetes cluster, because you can't use compose files to deploy to kubernetes.
It would be good for this to appeal to the broadest set of people, hence Kubernetes over other orchestrations.
@mbocevski thanks for bringing your experience in on this and offering to do a chart.
I've played about a bit with Helm charts and see that as useful for configuring the containers to talk to each other, but it would be good to get other views - maybe Helm is a bit niche and there are other better ways? Or would you and others suggest that a chart is going to be useful for the average person deploying to Kubernetes?
Helm was developed by the Deis team and it's a pretty decent implementation of a package manager for Kubernetes, in fact the kubernetes project accepted it as the default package manager and made it part of Kubernetes. For the average person deploying CKAN to Kubernetes, I would recommend helm due to the fact that it's simple to use and we can have specific documentation about CKAN on Kubernetes, because it would make sense that we use k8s things like configMap for the CKAN configuration and similar.
So in summary a configurable Helm chart with additions to the CKAN docs about running CKAN on kubernetes will be the simplest and smooth experience.
That's great. What's not clear to me is how someone can use a chart to install & configure CKAN extensions, but we can see.
@mbocevski I'd love to take you up on your offer - if you're willing to contribute a chart for us to play with, that'd be great to get us all going, for everyone's benefit.
Installing and configuring CKAN extensions will not be straightforward, because the best would be that they would be in the image. Our case is that we always create a Dockerfile which starts FROM a base CKAN image and then just install extensions there. I can give an example and also try to think whether we can do something better.
I'll create a chart and contribute it, also write up a way to easily get it up and going with minikube.
Awesome!!
I think the title is misleading. If it's specific to Helm, it should be named "Build Helm CKAN package". If it's generally about deploying CKAN and using Kubernetes orchestrator then there are at least three ways to do that.
you can't use compose files to deploy to kubernetes
Can't you? What about this: https://github.com/kubernetes/kompose ?
you will also be able to use compose files with their enterprise offering Docker EE, which will support kubernetes.
AFAIK, that's not specific to enterprise. The goal here is to have the same developer experience while benefiting from k8s production robustness. So both Docker CE and EE will have k8s OOtB.
So I think the "Kubernetes" term needs clarification here. What is kubernetes for you (as a developer or an ops guy)?
If it's the last, then using a tool to convert between docker/rkt/mesos/k8s/etc deployment formats would be viable. If it's the first, we as developers will be forced to continue maintaining mutliple configurations files and use cases.
It's just struck to me that maybe for you Kubernetes is a Web UI. Well, in that case recommending Docker CE makes no sense. And we'll be doomed to have discrepancy between development and production.
@Vanuan I don't appreciate your behaviour, I think it's completely inappropriate. I would encourage you to be respectful and encourage you to go through and read the CKAN Code of Conduct.
What about this: https://github.com/kubernetes/kompose ?
Kompose translates docker-compose.yml files into kubernetes resources, thus locking in to only the types that are supported in docker compose. Personally I think tools like this are helpful but not appropriate for maintenance.
So both Docker CE and EE will have k8s OOtB.
Docker CE for Mac and Windows will have k8s OOtB i.e. basically it will spawn a vanilla kubernetes that is tightly integrated with docker. On Linux though users would have to instantiate a kubernetes by themselves and likely some additional config to docker to reach the same OOtB experience as on Mac and Windows.
we as developers will be forced to continue maintaining mutliple configurations files and use cases.
Not if implemented properly, thus package managers come in handy. I don't see a big overhead to maintenance if the helm package is built correctly, integrated with the development flow and has helpers to extract all the relevant configuration in an automated way.
A lot of open source software is packaged in different flavours in order to support all the different distros and platforms. For those that want to deploy CKAN to Kubernetes, I think that a helm chart will give the smoothest experience from any perspective. If you think that there is a better alternative, please suggest and contribute so that the community can look into and collaboratively decide on which approach is best.
Thanks all for your inputs and thanks of course @mbocevski for your offer.
@Vanuan Just to be clear and please correct me @davidread if I didn't catch your original intention, this is about to explore a generic k8s based option that can work as a basis for the community to deploy their CKAN instances. We are not at this stage looking at making it an official way of deploying CKAN, this will only happen if we can ensure it works for most people and can be supported. But different people deploy CKAN in many different ways and we want to help people share their experiences and discuss the best approaches, specially with an emerging player with k8s.
Thanks. I wanted to clarify that. If it's a generic k8s option there's more than one way to do it. I felt it to be prejudicial to reject Kompose straight away. But if the intention of this issue was to provide Helm package I beg to rename.
I wanted this issue to be broader, and abstract away the k8s specifics. If it's not the right place to discuss this, I can open another issue.
I'm new to all this, so can you help me understand your proposals? I'll try and summarize but will probably get this wrong, so please correct me:
@Vanuan perhaps you are suggesting we collaborate on a compose file, because that is native to deploying to Docker Swarm/CE/EE and can be converted using kompose to work sufficiently well with a Kubernetes cluster?
@mbocevski you're saying that a Helm chart is native to Kubernetes and also deploys to Docker CE/EE. But if someone chose Docker Swarm, the chart would offer them some value as a basis for conversion? (i.e. is that what you mean by "helpers to extract all the relevant configuration in an automated way")
How important is it to support Docker Swarm as well as a Kubernetes cluster in the long term? Since this is a 'lively' topic right now, let's ensure answers to this are thoughtful professional opinions.
From my brief experience with Kompose it's a tool to give you a starting point to move from Compose to Kubernetes. When I used it on a project, it didn't provide the kubernetes configs that I could straight away deploy. If this is from a lack of insight on my side please correct me, but the first 2 paragraphs of the kompose repo seem to confirm that it's a tool to use when migrating, not necessarily to produce a complete & exact manifest, or for production use for that matter (note the 2nd paragraph):
kompose is a tool to help users who are familiar with docker-compose move to Kubernetes. kompose takes a Docker Compose file and translates it into Kubernetes resources.
kompose is a convenience tool to go from local Docker development to managing your application with Kubernetes. Transformation of the Docker Compose format to Kubernetes resources manifest may not be exact, but it helps tremendously when first deploying an application on Kubernetes.
I think a Helm chart would be a great addition, giving an easy deployment route for CKAN that's scalable, and a stepping stone for people interested in creating a more advanced k8s setup.
@davidread Yeap, I think that would be great. Container technology is still relatively new. I think some interoperable format is inevitable. And since you can't convert Help/ConfigMaps to compose, but it's possible vice versa, I think compose could be a great sharing point.
According to the documentation one could use compose v3 files to deploy to kubernetes without intermediate conversion step: https://github.com/kubernetes/kompose/blob/master/docs/user-guide.md#kompose-up
Helm chart output is also supported with kompose convert -c
.
Here's a compose file I use: https://github.com/Vanuan/ckan-base
Side notes:
There are the following issues I found that are not specific to swarm or kubernetes but shared:
@mbocevski ... thanks for your suggestion to add a helm chart for CKAN based on your experiences on working with CKAN in kubernetes.
You mentioned above:
Installing and configuring CKAN extensions will not be straightforward, because the best would be that they would be in the image. Our case is that we always create a Dockerfile which starts FROM a base CKAN image and then just install extensions there. I can give an example and also try to think whether we can do something better.
It could be that you could use a series of init-containers to install a specified list of extensions. This example of a jenkins-config that I came across, used to install jenkins plugins, is maybe an approach worth emulating ... where from a list of extensions to install, an init-container for each could clone the repo for that extension and install it .. with added extensions incrimentally mounted to the end CKAN installation. Just a thought.
When you have something preliminary up, could you point us there from here ... and maybe I can pitch-in in some way. Thanks.
@mbocevski ... has there been any progress on this?
Hi folks! As 2018 is just around the corner, was wondering how "we can work together" as per @davidread's original post at the top of the thread.
We'd also like to actively contribute to the effort (cc @jhinds) and collaborate with the group in a general Containerization initiative.
On the issue of configurability and CKAN extensions - when I originally created the CKAN Discourse extension, I was intrigued by how they pulled Discourse plugins from github and created a "Configurator" interface for plugin settings. The group may want to look into that technique when considering Plugins/Customization per @mbocevski's and @Vanuan's posts.
@jqnatividad so the technique is similar to udata's: pull every plugin into a single container and enable only those you need.
I was thinking of some kind of RPC. For example, datapusher requires both plugin and a service. Maybe it's possible to externalize the plugin part too.
Our team will be working on a basic CKAN Helm chart (public on Github) over the next couple weeks - while we'll be focused on a basic use-case, it seems logical to try and work in (especially if the community is going down that route). Will keep an eye on this issue and keen to be dragged into discussions :)
Hi All,
I’d like to chime in since we have an interest in this discussion and where it is headed. I wanted to share our experiences and hope to bounce some ideas of one another.
We are in the process of migrating our ckan instances to kubernetes and and have been leveraging helm for deployments. The chart includes ckan
, postgres
, redis
, solr
, and datapusher
. We will likely move postgres and redis off of k8s for certain environments so we have flags in the template to not deploy those containers when we don’t want to.
In regards to extensions we are doing something similar to @mbocevski where we have a base ckan image, an intermediate docker image that has some basic extensions, and a final docker image with specific extensions installed for that portal. With helm we just template out the image we are using and pass in what specific image we want at deploy time. I've found this approach works well but we are still in flux on where we stand in regards to installing some plugins at runtime for flexibility or keeping it like above for immutability purposes. I'm trying to avoid having images with too many extensions that aren’t needed and won’t be active if we don’t have to in order to keep images small(ish). We're till trying to get to that sweet spot.
For configuration we have our .ini
files as ConfigMaps
and they are stored as jinja2
templates that ansible
will populate with values, secrets, and potentially a list of plugins before creating the ConfigMap in k8s. Deactivating an extension is as easy as rerunning the script. New extensions and upgrades will require a new image build and deployment.
We try to keep everything that we can stateless as possible and leverage cloud storage for things that we can and depending on the component we use one of the recommended Persistent Volume options. Additionally I’ve recently had to write up some related docs and an internal tutorial for deploying ckan to k8s via helm so if there is a need for some documentation wherever this goes I’m happy to assist on that front as well.
@jhinds this all sounds great! Please do share this work so that we can build on it as a community.
@jqnatividad thanks for rekindling this conversation! @jhinds thanks for the thorough update, this looks very exciting and hopefully with some input from @mbocevski and @philtweir other approaches we can come up with a generic recommendation.
This is only tangentially related but I'm working on a Docker Compose setup that will install "local" extensions on a host folder so you can develop on them with the development server. Maybe there is something there that can be reused for the runtime-enabling of extensions.
Hi, I see there is a lot of talk about creating a ckan helm chart. I'm also interested, is there someone currently working on that ? I'm not a huge helm pro but I'd be glad to help the effort since I need that anyway. @philtweir or @jhinds do you have a public repo somewhere ?
For The Dataplace I've been working on a CKAN chart to be able to easily create new CKAN instances on kubernetes, mostly based off the images from https://github.com/okfn/docker-ckan. I've asked and they're alright to publish what we have so far. I will try to find some time next week to put it in a public repo.
We've been held up by a few other issues, but have been working on this over the last week. Have a rough set of scripts getting us from cloning fresh CKAN to a plugins-included Helm deployment, alongside our application, but it's still got issues - will update here as we progress, but keen to join in if there's a more stable one good to go :)
Just a quick update, didn't manage to get it ready last week, but I'm still actively working on this. There's a few things specific to our setup that I need to pull out of the chart, once that is done I'll be able to put it up.
hey everyone, I have a ckan kubernetes environment here - https://github.com/OriHoch/data4dappl/tree/master/k8s been running it on production for a while now, serving https://www.odata.org.il/
@OriHoch thanks for sharing that. It's a start and lots of useful examples. Hopefully future iterations or shares from other people will be templated into a chart. And no doubt everyone's ambition is to not to be limited to running on one node.
With apologies for the delay, we had been focusing on some docker-compose work, but have a basic chart here: https://github.com/lintol/helm-ckan . This does need some work still, but would be helpful to have feedback. The approach to the plugin problem is an initial build for your own CKAN, which gets pushed to a public/private repo, and variable/manual modifications to the production.ini template. There is no model here for filesystem persistence (so no Ceph, etc.), only via stateful services (e.g. PG), but should allow for independent CKAN process scaling. Anyhow, it's a step!
@philtweir nice one. Let's all give it a try and talk about how we can improve the rough edges!
Brief update, I have got a script to run paster commands (e.g. ./paster.sh --plugin=valid...
) against the installed chart by creating new jobs based on the retrieved post-install job's YAML, but if anyone has another suggestion, open to ideas!
hey everyone, the odata.org.il chart has been improved significantly lately, check it out - https://github.com/hasadna/hasadna-k8s/tree/master/charts-external/odata
The odata chart looks great - full featured and really helpful to see how that all fits together. Our current chart is pretty basic, which suits our particular need - after some thought, it still seems logical for us to polish the lintol/helm-ckan repo to have as a concise, generic, barebones CKAN deploy (with some simple plugin support).
We will likely use that repo as a common base for forking and expanding to more complex derived charts, as we tend to link CKAN with other complementary services in different settings (such as Lintol). If that still sounds useful for others, keen to take feedback on improvement priorities via the issues there. Nonetheless, I would recommend anyone looking for a more end-to-end approach - or setting up a scalable public sector open data portal - to start with OriHoch's excellent work above.
thanks a lot @philtweir
the latest version of my work is now here -
Thanks @OriHoch - will take a look through!
We at @datopian (@videruminc) have a full Kubernetes set up for CKAN - including deployment to multiple clouds. You can find the helm charts here https://github.com/datopian/ckan-cloud-helm
Happy to work on more detailed documentation and seeing if we can get something "official" if people are interested.
Current Status (last updated June 2020):
Original
Let's start using Kubernetes to deploy CKAN.
Here's the dream: someone creates a Helm chart (package) for CKAN. Now a single command deploys containers to a cluster, including CKAN, CKAN DataStore, SOLR and Postgres, all configured to talk to each other.
This is how I see it:
I'd encourage people to share what they are doing in this area and hopefully we can work together on the successes.