Closed fgregg closed 1 year ago
Next step here is to ID a project with a nightly, e.g., scrape and develop a proof of concept for running that scrape with a GitHub Actions. Some ideas include Lugar scrape or ~CPS (if it still does a nightly sync)~.
Probably want to prioritize app deployed on Heroku?
How to determine whether to use GitHub Actions or Heroku scheduler?
We've used GitHub Actions this for a lot of projects, both DataMade and personal. At this point, we are considering it as a potential alternative to Airflow, as it gives so many of the same upsides without the app overhead.
In general, we like it! Some downsides include pricing for private repos (#270) and imprecise cron runs. But the upsides are an awesome interface well integrated with version control and super simple configuration.
@smcalilly recommends AWS step functions as a cheaper alternative to explore.
Just to write down what I said in R&D, I don't think they'd be a good solution for integrating with GitHub Actions. They would be most useful if we have a long running (15+ minutes, which is the lambda timeout limit), multi-step task where we want to keep the data and code private. They're real nice for orchestrating tasks and creating a state machine with lambdas, and you have AWS APIs at your fingertips (including cloudwatch for observability). You basically connect a series of lambdas together as a state machine, and you can use yaml with the serverless framework to configure and provision any AWS resources you need.
I think i'm ready to push for this approach. @hancush is the next step to write a stack change proposal document?
@fgregg That's correct!
here's the type of doc to write as reference: https://github.com/datamade/how-to/blob/56087d662a3081c8e6189393378eec978eed060c/django/wagtail/research/recommendation-of-adoption.md
I've been experimenting with github actions a scraping platform and it's been really good.
example repos
twitter thread about it
i mentioned i was playing with this to @hancush, and she said it would be good to chat about it here.
@hancush, what would be a good next step?