Closed rvilim closed 5 years ago
Sorry I'll write some better docs. It's American thanksgiving from Thursday onwards so I should have some time.
So re:Circle, unless I'm missing something I can't see any tests that test the Postgres capabilities. I mentioned this in the docs I did write, but I left that the postgres functionality intact (I just split it out in the code). What I did change was that I added a command line flag to specify where you wanted it to go (either --postgres or --s3, not both). I definitely didn't modify any tests to accommodate that flag though.
Tagging this to #33 so people see it
This patch adds a way to make this script serverless to both cut down on costs (~$2 per year per year) and improve reliability.
It uses the Serverless framework to run the TTC scrape script as a cron job (serverless is snazzy! it even accounts for daylight savings!), then persists to S3. Cloudwatch automatically pulls in logs, which is pretty rudimentary right now but could be improved.
The major changes involve adding the handler method for the Lambda entry point and pulling out all the writing code into classes in writing.py. This lets us have Postgres and S3 as separate writers and the rest of the code not care which one gets used.
Major changes:
ttc_api_scraper.py scrape
usage now requires a --s3 or --postgresTodo: A query tool that automatically pulls the relevant files from S3 and turns it into a CSV (like the postgres script)