datamade / openness-project-nmid

Money Trail NM - New Mexico In Depth's Campaign Finance Explorer
https://moneytrailnm.com
3 stars 1 forks source link

Investigate Heroku connection speed #210

Open hancush opened 2 months ago

hancush commented 2 months ago

Imports periodically time out due to very slow connections to Heroku. Heroku Postgres databases are colocated with many other project databases on a Postgres server. Email support, and if this issue cannot be addressed, consider migrating database to RDS.

hancush commented 1 month ago

Heroku support claims no issues on the Postgres side. Opened a ticket with GitHub: https://support.github.com/ticket/personal/0/2990677

Another option to consider is making a larger runner available for imports, though the cost is higher: https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#per-minute-rates

hancush commented 1 month ago

Quick response from GitHub support. tl;dr - Resources (and many other things) can indeed vary between runs:


Hi hannah, 

Thank you for reaching out to GitHub! Yes, the available computing resources can indeed vary between runners on GitHub Actions, which could explain the variability in job performance you're seeing. Here are a few key factors that might affect the speed of your GitHub Actions jobs: 

I hope this helps! Please let us know if you have any questions, or if we can help with anything else at the moment.

 Cheers, James

fgregg commented 1 month ago

okay, so a few thoughts to explore:

  1. we try the more powerful runners. if we did the cheapest step-up, that could be around $100 / month (assuming 3 hours per year-import)
  2. we could do a self-hosted runner option, which we've done before and likely be cheaper than the beefier, native github runners
  3. within github actions, we could detect that we are a slow environment, and restart the action. https://github.com/orgs/community/discussions/67654#discussioncomment-8038649
  4. we could split the import job into smaller chunks. right now we are splitting them into year chunks, but we could split them into 6-month or 3-month chunks
  5. we could rewrite the import so there is less over-the-network communication (more batch). the import code used to be batchier, but led to memory problems when we were running the import on a heroku instance. in a github action we could make a different memory/time tradeoff.
hancush commented 1 month ago

@fgregg I definitely think a batchier job would be the most cost effective and least complex in the long term.