azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 11 forks source link

Update Tilegarden runtime to node 12 using lambci/node-custom-lambda #819

Closed KlaasH closed 3 years ago

KlaasH commented 3 years ago

Overview

The Tilegarden Lambda function has been running on the Node 8 runtime. Which is fine, except that it's deprecated, so we wouldn't be able to recreate it if something happened to make it break. Unfortunately Mapnik is not a happy camper on the AWS-provided Node 12 runtime. It crashes, as described in #801. As also noted on that issue, there's an issue on the node-mapnik repo where someone got around the problem by using a custom runtime provided by the LambCI project.

So this does the same. Fortunately

  1. LambCI publishes the runtime as versioned layers that you can just add to a Lambda function
  2. Claudia.js has a --layers option (as of v5.3) that works with both create and update

So this PR includes @flibbertigibbet's work to upgrade the container and dependencies (including Claudia.js) and gets it working on Lambda by switching the runtime to provided and loading the latest Node 12 layer from LambCI.

Notes

We've had problems in the past with Claudia providing options for the update command that don't actually do anything. I.e. it claims it will change something about a deployed function, but actually only manages to set the value on create. I don't know if they fixed the issue for other params, but I confirmed that update (which we use in deploy, as opposed to deploy-new) does successfully change the runtime and layers. (Actually from looking at the docs it looks like some of the options we use for create aren't available for update, so maybe they fixed the situation by removing options for things Lambda won't actually let you change.)

This means we shouldn't need to do anything fancy to make the deploy work. The command in infra should be able to successfully update the existing production Lambda function.

Testing Instructions

This is a little tricky to test, since the crash was only showing up on Lambda, not in local development. But the first thing to confirm is that local development still works.

I tested the deployed environment by pushing the test/update-tilegarden-node#801 branch. Jenkins picks up test branches and deploys them to staging. It takes a while after the Jenkins job is done for ECS to actually cycle the services (and they're allowed to go to zero, so the staging site will be down while it's working on it), but when it comes up it should successfully serve tiles.

In working on this, I first hand-modified then destroyed the staging Lambda function, so I didn't get a clean run of "upgrades from Node 8 to Node 12 in place". And we can't make a new Node 8 function to try it again. But I did confirm that if I manually change the runtime and remove the layer from the function, deploying again puts them back into the desired state.

Checklist

Resolves #801