ababaian / serratus

Ultra-deep search for novel viruses
http://serratus.io
GNU General Public License v3.0
258 stars 34 forks source link

Refactor worker nodes into ECS #29

Closed brietaylor closed 3 years ago

brietaylor commented 4 years ago

There are a couple of nuisances with our current worker strategy, that I think would be helped by moving most of what we've done to an orchestration system like ECS.

  1. Log streams are currently generated by instance, so the logs from all N workers get interleaved (making it hard to find errors)
  2. When a worker crashes, it never gets replaced.
  3. Updating the container images is a right pain. docker kill, docker rm, docker pull, find / -name part-001, (cloud-init script) /path/to/part-001. vs. pushing a new launch template and having fresh images in a couple of minutes.
  4. Ugly names for the ASGs (means we have to "discover" the names to do adjust desired sizes), like tf-asg-tf-serratus-dl-20200304125312000001, this is currently necessary, so that all instances get replaced when we change the user_data in the launch configuration, ECS would deal with sending the correct arguments to our scripts.

There are a couple things to work out though, first: