Updating the production Pelias Database with zero downtime

@chriswhong commented on Thu Jan 11 2018

I'm making an issue for this so I have something to link to.

Our team has had success getting the pelias stack up and running on a cloud server using docker-compose. I am now focusing on the update/deployment cycle as we will be iterating on our new custom importer.

I have gathered that there's no way to update the database, you have to start with a new elasticsearch database and run your importers again.

How can this be done in a production environment with zero downtime? is there a way to create a secondary container for elasticsearch and run importers to populate it, and then swap it out with the production database container when it's ready? Is this documented anywhere?

@spara commented on Thu Jan 11 2018

So the data is stored in the pelias container? Why not mount the data directory as a volume so it can be updated without stopping the pelias container?

@chriswhong commented on Thu Jan 11 2018

Pelias is not a single container, it's many containers. One is the database, one is the node app, and two others are supporting services. An update necessitates creating a new empty database container, and then running importers to populate it with data. Importers can take an hour to run, so need to set all of this up while the existing db is still being used by the app, and swap it out.

Reading about blue/green deployment, seems like it may be the answer but will require a second server.

@OriHoch commented on Thu Jan 11 2018

kubernetes is the best solution IMO for these type of problems, but does have a learning curve

here is an elastic search kubernetes pod which populates itself when it starts - https://github.com/Beit-Hatfutsot/mojp-k8s/blob/master/charts-external/mojp-dbs-elasticsearch/templates/mojp-dbs-elasticsearch.yaml

@spara commented on Thu Jan 11 2018

If you run it in a swarm, you can do something like docker service update pelias:verX to do a rolling update instead of a blue/green deployment.

https://docs.docker.com/engine/reference/commandline/service_update/#update-a-service

@antoine-de commented on Fri Jan 12 2018

I think the easiest way to handle this is to change a bit pelias to use elasticsearch aliases.

The aliases are an easy way to achieve 0 downtime.

I'm not a pelias expert but I think apart from minor changes it can do done since the pelias index can be set in the configuration.

You make an import in custom index and create a pelias alias around it. For a new import you import all the data in a newly created index, and when it's done you just move the pelias alias from the old index to the new one.

Note: it's an approach taken from another geocoder, mimir

@chriswhong commented on Fri Jan 12 2018

@antoine-de Thanks, an alias sounds more feasible because the data in the elasticsearch database is stored in a volume, so spinning up a new database container would require it to store its data in a different place from the running container.

@chriswhong commented on Fri Jan 12 2018

@spara I guess what I am not sure about here is that it's not enough to just update the service (the database), it's a multi-step process. First you have to start the database, then you have to run another script (another docker-compose command) to populate it with data. That import script is using the docker internal hostnames (http://elasticsearch:9200) so I'd need to be able to tell the import script to import into a different database.

@spara commented on Fri Jan 12 2018

@chriswhong Yes, you would still need to update the database via the method you're using currently, the service update rolls in the new ES container after it's been built. Are the Dockerfiles and Compose scripts available? I'm having a hard time understanding the architecture and it's been a number of years since I've used ES.

@orangejulius commented on Fri Jan 12 2018

@chriswhong for Mapzen Search we used a separate Elasticsearch cluster to build new data, took a snapshot which was stored in S3, and then "rotated" indices in the live production cluster using Elasticsearch Index Aliases and it works great. The API did not even have to be restarted or know anything changed, as from its perspective, there was always an index in Elasticsearch with the name it was expecting.

However, if I were to do it again (and I probably will 😁), and provided I was doing it on some sort of cloud where you can launch new instances easily, I would go for a blue/green model. Launch an entirely new second cluster, load the snapshot into it, and when it was ready, use some means to switch which one is live, and then shut down the old one.

Okay, that's great in theory, but what about the docker-compose setup? I'm sure there's lots we can do to improve it, but I don't think we can completely solve this problem. It's mainly meant for demos, and without at least upgrading to docker-swarm, you can't even launch replicas of a service, so I don't see how one would do a blue/green deploy with Elasticsearch.

If you wanted to take it to the next level, take a look at pelias/kubernetes which is our "new way" of managing Pelias in production. It uses Kubernetes for all the services, and then separate EC2 instances managed by Terraform for Elasticsearch. There's great work being done to make Elasticsearch work really well in Kubernetes, but Elasticsearch really is a finicky beast. For now it needs to live on dedicated instances and managed "traditionally".

The data update problem is not yet solved in the pelias/kubernetes repo, but the building blocks are there. I think we can find a way forward there that works really well.

@chriswhong commented on Fri Jan 12 2018

Thanks for the advice @orangejulius, given our limited resources we may lean towards a solution that will keep things simple and may require a few more manual steps each time we update. Elasticsearch Index Aliases looks like the most attractive option at this point. (We are still tinkering, but need to be able to update the deployed database a few times a day as we modify our importer.)

@orangejulius commented on Fri Jan 12 2018

That makes sense. Aliases will work great in the docker-compose setup. If you have a separarate instance of the docker-compose setup to do the build, you can save the snapshot from the build, copy the files to the right place (or if you want to get more fancy, have both docker-compose setups mount a shared snapshot volume through docker), load it in, and use an alias to do the swap.

A lot of that logic can or should probably live in the pelias/dockerfiles repo, so lets figure out how we can do that and make it easy.

NYCPlanning / labs-geosearch-docker

Updating the production Pelias Database with zero downtime #10