backstrokeapp / server

:swimmer: A Github bot to keep repository forks up to date with their upstream.
https://backstroke.co
MIT License
667 stars 62 forks source link

RFC: Backstroke Migration #66

Closed 1egoman closed 6 years ago

1egoman commented 7 years ago

I'm working on a rewrite of Backstroke. This has been a long time coming (over 6 months!) but I feel that it makes the system much more stable and predictable. In its current state, deploying updates to the live system is a challenge (and as a consequence, I haven't done it for months.) This isn't something I'm all that good at, so I'd love for anyone more experienced than me to let me know what I'm doing right and what I'm doing wrong.

Current System Architecture

old

What currently exists is deployed on Heroku on a free Dyno, using a mlab sandbox database.

Serious problems with the current approach

Rewrite plan

In general, I want to try to split the system into a number of smaller services. One of the biggest changes involves link updates - the current plan is to stick all link update operations into a queue with workers at the end that perform the actual updates. As a consequence, the response to curl -X POST https://backstroke.us/_linkid will return something like this:

{
  "status": "ok",
  "enqueuedAs": "id-of-thing-in-queue-here"
}

And then, to get the status of the webhook operation, make a call to https://api.backstroke.us/v1/operations/id-of-thing-in-queue-here, which returns something like this:

{
  "status": "ok",
  "startedAt": "2017-09-01T11:26:06.722Z",
  "finishedAt": "2017-09-01T11:28:06.722Z",
  "output": {
    "many": true
    // anything else returned by the worker
  }
}

The other large change is less of a reliance on webhooks. They are a side effect that is a pain to mange. Currently, links store two values: the last updated timestamp and the last known SHA that is the head of the upstream's branch. Every couple minutes, a timer is run in the background that finds all links that haven't been updated in 10 minutes (in this way, link updates are staggered so only a subset of all links are updated every couple minutes). If a link hasn't been updated in 10 minutes, then the SHA of the upstream branch is checked, and if it differs from the stored SHA, an automatic link update is added to the queue. Currently, this functionality lives in the api.backstroke.us service below, but once that service has to be scaled past one instance that functionality would probably be extracted to another service.

Services in green are ones that I have already set up and services in red are ones that haven't been written yet:

layout

NOTE: All green services are actually deployed. Check them out! :) Things may change though, so don't be surprised if I clear the database or something.

How I'm planning on fixing the serious problems:

Deployment

Before, this service was deployed on Heroku. I'm currently pursuing a sponsorship by DigitalOcean (They've said they'll give Backstroke $350 in free credits, but this was a few months ago. I need to follow up with them.)

If I'm unable to secure the DigitalOcean sponsorship (which is what it is looking like) then deployment is up in the air. I'm currently still deploying all the new services on Heroku as free dynos, utilizing Heroku Postgres and Heroku Redis for the stateful components of the system. Through Gratipay, we have about $4 a month available to put towards infrastructure. I think this could all be hosted on one DigitalOcean droplet of the smallest size, which is $5/mo. AWS, Google cloud platform, and other services should be explored too. Though I don't have as much experience with them they could work out too.

Questions for others

❤️ A thanks to all users - Backstroke has been a fun project to grow over the past year and a half. I hope we can make it better together!

Ryan Gaus, @1egoman

A number of users who have reported issues or commented on issues that may have opinions on these changes: @evandrocoan @thtliife @gaearon @eins78 @radrad @jeremypoulter @johanneskoester @m1guelpf

m1guelpf commented 7 years ago

I have a paid instance of deployhq.com, so if you want I can "donate" the deploy tool and then you can stop worrying about that :smile:

1egoman commented 7 years ago

@m1guelpf Are you talking about for the current deployment, or with the new architecture I'm proposing? I'll do some research this weekend and see it would work be helpful for the current state of affairs (I'm unfamiliar with the service) but I think for the new architecture I'd like to try out some sort of immutable deployment such as Docker (and at a cursory glance, DeployHQ doesn't seem immutable). Thanks!

m1guelpf commented 7 years ago

@1egoman I was talking about the old one. I don't think it supports Docker or any other type of immutable deployment... :sad:

1egoman commented 7 years ago

Cool. I'll do some research this weekend. 😄

1egoman commented 7 years ago

@m1guelpf It doesn't look like DeployHQ works with Heroku, so thank for the offer, but I don't think it'll be helpful for maintaining the current Backstroke version 🙁

eins78 commented 7 years ago

@1egoman It sounds like https://zeit.co/now does what you want? Maybe @rauchg wants to "pitch" in? ;)

1egoman commented 7 years ago

@eins78 Interesting, I'll do some research and see if now could fit Backstroke's needs.

1egoman commented 7 years ago

I love the idea of now, but I'm having trouble deploying this repository.

https://zeit.co/rgausnet/server/xvoeisygwy shows that the container takes a very long time to start, though the logs seem to show that the container actually started previously. Once the container did start, I get repeatedly get 502s: https://server-xvoeisygwy.now.sh/

@rauchg Any help you could provide with this?

Also sent a support email to support@zeit.co:

Hello,

I'm thinking about deploying my open source project Backstroke (https://github.com/1egoman/backstroke) on Zeit. I seem to be having some issues with the deployment.

zeit.co/rgausnet/server/xvoeisygwy shows that the container takes a very long time to start, though the logs seem to show that the container actually started previously. Once the container did start, I get repeatedly get 502s: server-xvoeisygwy.now.sh

For reference, here's the Dockerfile for the container that I'm trying to deploy: https://github.com/backstrokeapp/server/blob/master/Dockerfile, and here's the repository: https://github.com/backstrokeapp/server.

Thanks for the help!
Ryan
1egoman commented 7 years ago

Update: I've secured the domain backstroke.co. My hope is use this domain name instead of backstroke.us for the new deployments as it makes maintaining backward compatibility easier. Any GET requests to backstroke.us would 301 redirect to backstroke.co, and any POST requests would be handled in the old fashion. Here's an updated diagram of the architecture:

layout

Also, I've mostly finished work on the legacy backstroke service (now to be hosted at backstroke.us. The code can be found here.

1egoman commented 7 years ago

I've spent the last week or so writing a deployment script for Backstroke. My current plan is to host all services on a DigitalOcean droplet, with each service running within docker. In the near term, I plan to use docker-compose to spin up all services on server start since I'm not too concerned with scaling right off the bat. (If I want to scale the service further, I might try nomad.)

My goal was to run all these services on the smallest size droplet (1 core, 512mb ram) but it looks like that is going to be near impossible. Between haproxy, docker and two node processes, the instance runs of of memory within a couple minutes. I'm now running on the 2nd-smallest droplet (2 core, 1gb ram) and I can run docker, redis, and three node processes (worker, server, and legacy) with about ~150mb of ram left over.

I'd prefer to rely on a third party service for hosting the database rather than do it myself, though depending on cost it may make sense for me to just figure it out on my own. Currently, I'm relying on a Heroku free-tier database with a 10,000 row limit (and linking to it externally from the server container) but this is far from optimal.

Unfortunately, this means that I'm going to be spending a bit of of pocket for now - hopefully this new version will gather some more Gratipay donations and can be self sufficient!

While the deployment scripts aren't ready to open source (of course, with secrets redacted), I'll post a link once they are ready. All the existing services in the diagrams above are also now hosted at backstroke.co - api.backstroke.co, app.backstroke.co, and backstroke.co. The legacy service is also hosted at legacy.backstroke.us, though this will eventually be aliased to backstroke.us. Feel free to try them out - I'd love to get feedback on all the work I've been doing over the past 6 months.

I'm hoping these updates provide transparency into how Backstroke's upcoming release is shaping up. Are these helpful? Thanks for using Backstroke! ❤️

1egoman commented 7 years ago

I finished up the deployment repository. https://github.com/backstrokeapp/deployment

1egoman commented 7 years ago

A number of helpful things have happened since the last update:

  1. I was able to secure a sponsorship from DigitalOcean! For the next year, at least, hosting shouldn't be an issue. There's now a nice sponsorship note on the new website. Also, this means that I've upgraded to a 2gb droplet, on which the systems perform much better.
  2. A complete local development environment has been set up in the https://github.com/backstrokeapp/deployment repository. Now, with one command, a near-replica of the production system can be set up on a local computer. There's a step-by-step list of what's required to do this.
  3. I built a small tool to help visualize the entire system when link updates are flowing through. While it's not ready to release, it's tremendously helpful in local development / gaining of an understanding of what's actually going on.
  4. Fixed a couple bugs in the webhook job. Running on a current replica of production, ~80% of links are able to be processed and executed on by the worker. I suspect the last ~20% are due to a couple factors that haven't been considered:
    • Repositories being deleted that are in a link, and the worker not taking this into account.
    • Branches being deleted that the link was supposed to pull changes from or propose changes into.
  5. Fixed an issue in which a worker handing a number of links could potentially exhaust the gthub token's rate limit. Unfortunately, this means it can sometimes take up to 2 minutes or so for a link operation to process, but I think that's an adequate target for now.

I'm nearly ready to release this thing. I'm a bit worried that once it's released, I will have forgotten to verify an edge case and I'll get an angry issue, but I think I just need to bite the bullet. My goal is to release this new stuff by next weekend.

1egoman commented 7 years ago

Pre-deployment

Deployment checklist

Verify

Take down old stuff (do this once sure that the new stuff is stable)

1egoman commented 7 years ago

The deployment happened at 3pm EDT on October 7th, 2017. The service was down from 3pm to 3:10pm.

I'm glad that this new stuff is finally deployed. Over the next week or so I expect for a few issues to come in with scenarios that I didn't take into account when working on the new stuff, but all in all, I'm pretty satisfied with this release.

Dashboard: https://github.com/backstrokeapp/dashboard/releases/tag/v2.0.0 Server: https://github.com/backstrokeapp/server/releases/tag/v2.0.0 Legacy: https://github.com/backstrokeapp/legacy/releases/tag/v2.0.0 Worker: https://github.com/backstrokeapp/worker/releases/tag/v2.0.0 Deployment: https://github.com/backstrokeapp/dashboard/releases/tag/v2.0.0

In a few weeks, I'll complete the migration by taking down all the old stuff on Heroku, and close this issue.