CivicTechTO / ttc_subway_times

A scraper to grab and publish TTC subway arrival times.
GNU General Public License v3.0
40 stars 30 forks source link

Added the ability to run this with Serverless (AWS Lambda) and persist to S3 with CloudFront Logging #50

Closed rvilim closed 5 years ago

rvilim commented 5 years ago

This patch adds a way to make this script serverless to both cut down on costs (~$2 per year per year) and improve reliability.

It uses the Serverless framework to run the TTC scrape script as a cron job (serverless is snazzy! it even accounts for daylight savings!), then persists to S3. Cloudwatch automatically pulls in logs, which is pretty rudimentary right now but could be improved.

The major changes involve adding the handler method for the Lambda entry point and pulling out all the writing code into classes in writing.py. This lets us have Postgres and S3 as separate writers and the rest of the code not care which one gets used.

Major changes:

Todo: A query tool that automatically pulls the relevant files from S3 and turns it into a CSV (like the postgres script)

rvilim commented 5 years ago

Sorry I'll write some better docs. It's American thanksgiving from Thursday onwards so I should have some time.

So re:Circle, unless I'm missing something I can't see any tests that test the Postgres capabilities. I mentioned this in the docs I did write, but I left that the postgres functionality intact (I just split it out in the code). What I did change was that I added a command line flag to specify where you wanted it to go (either --postgres or --s3, not both). I definitely didn't modify any tests to accommodate that flag though.

radumas commented 5 years ago

Tagging this to #33 so people see it