edgi-govdata-archiving / web-monitoring-task-sheets

Experimental new tool for generating weekly analyst task sheets for web monitoring
GNU General Public License v3.0
3 stars 0 forks source link

Dockerize this so it can run as a scheduled AWS ECS #6

Open Mr0grog opened 3 years ago

Mr0grog commented 3 years ago

This script has pretty big memory requirements, so we avoid running it as a Kubernetes job (doing so would mean holding open expensive space on the cluster, or else potentially disrupting the cluster when it’s running; both are bad).

It’s also not well integrated with the readability server (a separate Node.js process).

To deal with those, we currently run this job by hand, which is not very good for lots of reasons. Instead, we should solve these two problems together: run it on AWS ~Batch (designed exactly for running large, intermittent, automated jobs like this)~ ECS, which requires containerizing it (which simplifies the parallel Node.js and Python issue).

Breakdown of Steps

Let’s make each of these a separate PR/task/whatever, and not try to bite this whole thing off in one chunk.

Update: 2021-12-17: use ECS instead of Batch, add breakdown of steps.

Mr0grog commented 2 years ago

After more lessons learned on this, we should probably be running as a scheduled task on ECS, not Batch. But the basic requirements (dockerizing) are the same.