The Tilegarden Lambda function has been running on the Node 8 runtime. Which is fine, except that it's deprecated, so we wouldn't be able to recreate it if something happened to make it break. Unfortunately Mapnik is not a happy camper on the AWS-provided Node 12 runtime. It crashes, as described in #801. As also noted on that issue, there's an issue on the node-mapnik repo where someone got around the problem by using a custom runtime provided by the LambCI project.
So this does the same. Fortunately
LambCI publishes the runtime as versioned layers that you can just add to a Lambda function
Claudia.js has a --layers option (as of v5.3) that works with both create and update
So this PR includes @flibbertigibbet's work to upgrade the container and dependencies (including Claudia.js) and gets it working on Lambda by switching the runtime to provided and loading the latest Node 12 layer from LambCI.
Notes
We've had problems in the past with Claudia providing options for the update command that don't actually do anything. I.e. it claims it will change something about a deployed function, but actually only manages to set the value on create. I don't know if they fixed the issue for other params, but I confirmed that update (which we use in deploy, as opposed to deploy-new) does successfully change the runtime and layers. (Actually from looking at the docs it looks like some of the options we use for create aren't available for update, so maybe they fixed the situation by removing options for things Lambda won't actually let you change.)
This means we shouldn't need to do anything fancy to make the deploy work. The command in infra should be able to successfully update the existing production Lambda function.
Testing Instructions
This is a little tricky to test, since the crash was only showing up on Lambda, not in local development. But the first thing to confirm is that local development still works.
I tested the deployed environment by pushing the test/update-tilegarden-node#801 branch. Jenkins picks up test branches and deploys them to staging. It takes a while after the Jenkins job is done for ECS to actually cycle the services (and they're allowed to go to zero, so the staging site will be down while it's working on it), but when it comes up it should successfully serve tiles.
In working on this, I first hand-modified then destroyed the staging Lambda function, so I didn't get a clean run of "upgrades from Node 8 to Node 12 in place". And we can't make a new Node 8 function to try it again. But I did confirm that if I manually change the runtime and remove the layer from the function, deploying again puts them back into the desired state.
Overview
The Tilegarden Lambda function has been running on the Node 8 runtime. Which is fine, except that it's deprecated, so we wouldn't be able to recreate it if something happened to make it break. Unfortunately Mapnik is not a happy camper on the AWS-provided Node 12 runtime. It crashes, as described in #801. As also noted on that issue, there's an issue on the
node-mapnik
repo where someone got around the problem by using a custom runtime provided by the LambCI project.So this does the same. Fortunately
--layers
option (as of v5.3) that works with bothcreate
andupdate
So this PR includes @flibbertigibbet's work to upgrade the container and dependencies (including Claudia.js) and gets it working on Lambda by switching the runtime to
provided
and loading the latest Node 12 layer from LambCI.Notes
We've had problems in the past with Claudia providing options for the
update
command that don't actually do anything. I.e. it claims it will change something about a deployed function, but actually only manages to set the value on create. I don't know if they fixed the issue for other params, but I confirmed thatupdate
(which we use indeploy
, as opposed todeploy-new
) does successfully change the runtime and layers. (Actually from looking at the docs it looks like some of the options we use forcreate
aren't available forupdate
, so maybe they fixed the situation by removing options for things Lambda won't actually let you change.)This means we shouldn't need to do anything fancy to make the deploy work. The command in
infra
should be able to successfully update the existing production Lambda function.Testing Instructions
This is a little tricky to test, since the crash was only showing up on Lambda, not in local development. But the first thing to confirm is that local development still works.
I tested the deployed environment by pushing the
test/update-tilegarden-node#801
branch. Jenkins picks up test branches and deploys them to staging. It takes a while after the Jenkins job is done for ECS to actually cycle the services (and they're allowed to go to zero, so the staging site will be down while it's working on it), but when it comes up it should successfully serve tiles.In working on this, I first hand-modified then destroyed the staging Lambda function, so I didn't get a clean run of "upgrades from Node 8 to Node 12 in place". And we can't make a new Node 8 function to try it again. But I did confirm that if I manually change the runtime and remove the layer from the function, deploying again puts them back into the desired state.
Checklist
Resolves #801