Right now, WatchStatus.. lambda reads the dynamodb table stream and fire (wrapped) Lambdas accordingly.
Instead, WatchStatus should put the info to fire the Lambda into an SQS queue.
An EC2 instance should be configured to read the SQS queue and process events, by "firing" the associated lambda (i.e. running the corresponding code on the EC2 instance itself).
Why?
Cost: running 10 million Lambdas for 10 seconds each costs $200, and that's a realistic amount of lambdas for processing the million paper in the arxiv
Still scales reasonably well. The SQS queue has nice properties so we don't miss events, and we can always boot up more EC2 instances to process the queue in parallel. We keep the paralellized structure we had from the lambdas (each queue event causes 1 thing to happen which updates the table, triggering the next event down the line, which could even be processed by a different EC2 instance).
What needs to be done?
Need to
learn about SQS and EC2 a bit,
update the cloudformation template to deploy SQS and EC2,
update watchstatus.. to send events to the SQS queue instead of firing wrappers
either poll the SQS queue from EC2 or figure out how to fire code from events. Write a small handler to run the correct function based on the details from the queue. Ideally one which can fire both python and javascript events. (Python 3.5's subprocess.run() sounds pretty good for firing these off).
configure appropriate cloudwatch logging for what the EC2 instance
Right now, WatchStatus.. lambda reads the dynamodb table stream and fire (wrapped) Lambdas accordingly.
Instead, WatchStatus should put the info to fire the Lambda into an SQS queue.
An EC2 instance should be configured to read the SQS queue and process events, by "firing" the associated lambda (i.e. running the corresponding code on the EC2 instance itself).
Why?
What needs to be done?
Need to