datamade / how-to

📚 Doing all sorts of things, the DataMade way
MIT License
80 stars 12 forks source link

Future hosting for lametro-dashboard #336

Closed fgregg closed 11 months ago

fgregg commented 11 months ago

Background

We are seeking to move away from EC2 as a hosting provider for all projects.

LA Metro Dashboard poses some challenges for this goal. Heroku does not support Docker in Docker. Our preferred deployment pattern to heroku is to use containers.

Options:

  1. Deploy to heroku without a containers, then heroku could use Dockers for the jobs.
  2. Change the la-metro-dashboard script to not use docker. then we could use dockerized deployment to heroku
  3. Use a different hosting platform. We might be able to do what we want with fly.io
  4. Use a different tool besides airflow. Github actions has most of the functionality we would want.

Of these options, i have a preference for 4.

  1. seems possible,
  2. seems quite bad
  3. seems like it could be attractive if, and only if, we still are thinking about moving off of heroku

Would be good to discuss with the metro tech team.

fgregg commented 11 months ago

this is something to discuss with @hancush , and may not be an RD issue.

hancush commented 11 months ago

I think it's fine under R&D if the question is whether GitHub Actions can be a suitable replacement for Airflow. Some important considerations are:

  1. Scheduling: Can GitHub Actions run based on a cron?
  2. Reliability of runs: Is there a chance GitHub Actions won't run on schedule?
  3. Observability for non-technical users: Can we use GitHub Actions, a third party dashboard, or a roll-your-own dashboard to show the outcomes of recent runs?
  4. Integration with external resources: Can GitHub Actions securely push data to external Postgres and Solr/Elasticsearch resources?
fgregg commented 11 months ago

This doc has a comparison table between github actions and airflow that can be helpful.

  1. Github Actions can run on chron. The minimum period is five minutes.
  2. Yes, i've noticed that the job will not always run right on schedule.
  3. We get some of this for free, but i guess it's a question of whether the github actions dashboard is good enough. if it isn't there are APIs we could integrate with.
  4. Yes, we are doing this already with https://github.com/datamade/chicago-council-scrapers/actions
hancush commented 11 months ago

Actions dashboard is not good enough for Metro. Their dashboard also includes stats from the app, and has an easier to understand interface for non-technical users.

hancush commented 11 months ago

So, I guess another consideration is, can Actions info be retrieved via API, or are there other dashboard products that integrate with Actions that we can use?

fgregg commented 11 months ago

github actions does provide a rich API

https://docs.github.com/en/rest/actions/workflow-runs?apiVersion=2022-11-28

as far as third-party products, a quick google search indicates that there are some. whether they are any that fill our needs, idk.

smcalilly commented 11 months ago

if a different platform becomes available we can move latmetro to it.