Refactor worker nodes into ECS

There are a couple of nuisances with our current worker strategy, that I think would be helped by moving most of what we've done to an orchestration system like ECS.

Log streams are currently generated by instance, so the logs from all N workers get interleaved (making it hard to find errors)
When a worker crashes, it never gets replaced.
Updating the container images is a right pain. docker kill, docker rm, docker pull, find / -name part-001, (cloud-init script) /path/to/part-001. vs. pushing a new launch template and having fresh images in a couple of minutes.
Ugly names for the ASGs (means we have to "discover" the names to do adjust desired sizes), like tf-asg-tf-serratus-dl-20200304125312000001, this is currently necessary, so that all instances get replaced when we change the user_data in the launch configuration, ECS would deal with sending the correct arguments to our scripts.

There are a couple things to work out though, first:

[ ] will we use Daemon or Replication jobs? Daemon doesn't solve 1, but replication doesn't solve 4. We need a way to force all images to be replaced if we change them.
[ ] ECS + Cloudwatch Logs
[ ] ...and more, maybe?

ababaian / serratus

Refactor worker nodes into ECS #29