NYCPlanning / labs-geosearch-docker

Main repository for running the Planning Labs geosearch API powered by pelias
12 stars 3 forks source link

Update CLI tool, replace terraform with cloud-init, and deprecate custom importer #64

Closed TylerMatteo closed 1 year ago

TylerMatteo commented 1 year ago

This PR is an overhaul of how Geosearch builds, downloads and imports the normalized PAD data, and gets deployed to Digital Ocean. For clarity, I'll cover the changes in a few sections

1. Updating docker-compose.yml and deprecating labs-geosearch-pad-importer.

Until now, Geosearch relied on a custom Pelias importer to load the CSV outputted from the labs-geosearch-pad-normalize into the Pelias ElasticSearch datastore. This made upgrades very difficult and unnecessarily increased the complexity of the project. To overcome this, we changed labs-geosearch-pad-normalize to output a CSV that conforms to the schema expected by the built-in Pelias csv-importer. With that out of the way, I was able to update the images uses in docker-compose.yml to specific recent releases and delete the contents of the /mounts folder because we no longer need to manually "overwrite" code coming from the images we pull. I also cleaned up docker-compose.yml to only pull images we need and to more closely resemble examples you would see in the pelias/docker repo. This means that labs-geosearch-pad-importer is now completed deprecated and can be archived. The pelias csv importer also includes functionality to download data. This means we no longer need to curl the Digital Ocean Space for the PAD CSV ourselves. The URL used to download the data can be found in pelias.json under imports.csv.downloads.

2. Updating contents of cmd folder and adding lib folder

This repo makes extensive use of the pelias CLI tool taken from the pelias/docker repo. Prior to this PR, the contents of the /cmd folder were based on the cmd folder in that repo but had been changed and extended to include commands only relevant to this repo specifically. In order to make maintaining this repo easier, I updated it to exactly match the cmd and related lib folder from that Pelias repo as of the time of writing this. This means that this repo no longer has any "bespoke" changes to the cli commands and is up to date with the CLI tool in the pelias/docker repo.

3. Replacing Terraform with cloud-init

Up until now, this repo used terraform for creating and provisioning a Digital Ocean droplet for production deployments of Geosearch. Most of that code was contained in the deleted main.tf file but it also used several commands that were added to the included Pelias CLI tool. I decided to remove terraform in favor of using cloud-init for these reasons:

  1. Terraform is a relatively complex tool that we don't use anywhere else in our portfolio. It is meant to be used for created cloud assets but to have only one place in our portfolio where we use it creates unncessary complexity that requires learning a whole new tool just to make changes to this one application. Furthermore, if we do adopt TF in the future, it is often a best practice to centralize TF code in one repo instead of including the TF code for a given app alongside its source code.
  2. The TF code that was here made heavy use of local-exec and remote-exec provisioners. Using provisioners this heavily for doing things like starting up the Geosearch services and importing data is not a TF best practice and provisioners are meant to be used as a last resort. You can read me about this from TF's official docs here
  3. cloud-init is a cloud-agnostic industry standard tool for provisioning cloud resources. The code for it is entirely contained with cloud-config.yml and Digital Ocean has out-of-the-box support for it. It's not the only tool for this purpose, but I felt that it's the best tool for the job in this scenario.

4. New Github Actions

I replaced the existing github action workflows with the new one found in /.github/workflows/build.yml. This workflow uses doctl to create the droplet and prompt Digital Ocean to use cloud-init to execute the contents of cloud-config.yml. It then "polls" the IP of the newly created droplet to return a successful status if/when it gets a 200 status code, signifying that Geosearch is ready to accept traffic. For details on this process, check out the "Deployment" and "How exactly do deployments work" sections of the updated README.md file.

5. How do I know the version of PAD being used in a given Geosearch droplet?

One conspicuous change here is the removal of the version.env file and changes to how droplets are deployed that mean that the PAD version is no longer part of the droplet name. version.env is made irrelevant because the URL in pelias.json will always pull from the "latest" folder in DO. If you are trying to figure out which PAD version was used by a given Geosearch droplet, simply query that droplet and look at the response. We updated the normalizer to include a "version" property at addendum.pad.version in the properties of each geojson feature returned by Geosearch which wills tell you the PAD version used. This also has the added benefit of making it apparent to external Geosearch API users which version they are looking at.