1egoman commented 7 years ago

I'm working on a rewrite of Backstroke. This has been a long time coming (over 6 months!) but I feel that it makes the system much more stable and predictable. In its current state, deploying updates to the live system is a challenge (and as a consequence, I haven't done it for months.) This isn't something I'm all that good at, so I'd love for anyone more experienced than me to let me know what I'm doing right and what I'm doing wrong.

Current System Architecture

old

What currently exists is deployed on Heroku on a free Dyno, using a mlab sandbox database.

Serious problems with the current approach

The app has grown to be big enough where I no longer understand the whole thing.
Tests are minimal, and it's questionable whether they are actually helpful in rooting out bugs.
Deployment to Heroku has recently been failing in strange ways due to state-persisting issues, so I no longer trust the deployment to work. Unfortunately, that means the last deployed commit is d782482e6e16a4df2d85bd48b2461c6f7a235a52, and probably will be until serious changes can be made.
The link updating code (ie, webhook code) is very hard to maintain in the context of the rest of the app. Also, there have been many issues due to this.

Rewrite plan

In general, I want to try to split the system into a number of smaller services. One of the biggest changes involves link updates - the current plan is to stick all link update operations into a queue with workers at the end that perform the actual updates. As a consequence, the response to curl -X POST https://backstroke.us/_linkid will return something like this:

{
  "status": "ok",
  "enqueuedAs": "id-of-thing-in-queue-here"
}

And then, to get the status of the webhook operation, make a call to https://api.backstroke.us/v1/operations/id-of-thing-in-queue-here, which returns something like this:

{
  "status": "ok",
  "startedAt": "2017-09-01T11:26:06.722Z",
  "finishedAt": "2017-09-01T11:28:06.722Z",
  "output": {
    "many": true
    // anything else returned by the worker
  }
}

The other large change is less of a reliance on webhooks. They are a side effect that is a pain to mange. Currently, links store two values: the last updated timestamp and the last known SHA that is the head of the upstream's branch. Every couple minutes, a timer is run in the background that finds all links that haven't been updated in 10 minutes (in this way, link updates are staggered so only a subset of all links are updated every couple minutes). If a link hasn't been updated in 10 minutes, then the SHA of the upstream branch is checked, and if it differs from the stored SHA, an automatic link update is added to the queue. Currently, this functionality lives in the api.backstroke.us service below, but once that service has to be scaled past one instance that functionality would probably be extracted to another service.

Services in green are ones that I have already set up and services in red are ones that haven't been written yet:

layout

NOTE: All green services are actually deployed. Check them out! :) Things may change though, so don't be surprised if I clear the database or something.

backstroke.surge.sh - The new website. Code can be found here. I think it more accurately portrays Backstroke with it's upcoming changes.
legacy.backstroke.us - Many people are still using Backstroke Classic. To maintain backward compatibility, I need to run a service to emulate the old behavior. This still needs to be written.
backstroke.us (nginx) - A reverse proxy to run at backstroke.us, directing all POST requests to legacy.backstroke.us and all GET requests to backstroke.surge.sh. Required to keep Backstroke Classic working.
app.backstroke.us - The new dashboard. It simplifies the process of link management significantly. Screenshots and code are here.
api.backstroke.us - Manages user authentication and link CRUD. This is the only services that is connected to the database, which means that it's the only stateful service. This is a massive win. Also, this service handles adding webhook operations to Redis for the worker on a timer or when a user pings a webhook url.
Backstroke Worker - The worker reads operations from the Redis queue, performs them, and sticks the results back in Redis to be displayed by the api.backstroke.us service. The worker is stateless, small, and tested well.

How I'm planning on fixing the serious problems:

Splitting up the system into multiple smaller services means that each will be smaller and easier to understand. Backstroke was initially written in a weekend and was something that wasn't built to be maintained. When rewriting the system, documentation and easy to understand code have both been a big priority.
Though I'm not following strict TDD, I'm writing the tests concurrently as I write the code. In consequence, the tests that are being written are more helpful. However, I could still improve.
See below for deployment thoughts.
The link updating code has been broken out into its own service, where it can be maintained as a separate entity.

Deployment

Before, this service was deployed on Heroku. I'm currently pursuing a sponsorship by DigitalOcean (They've said they'll give Backstroke $350 in free credits, but this was a few months ago. I need to follow up with them.)

If I'm unable to secure the DigitalOcean sponsorship (which is what it is looking like) then deployment is up in the air. I'm currently still deploying all the new services on Heroku as free dynos, utilizing Heroku Postgres and Heroku Redis for the stateful components of the system. Through Gratipay, we have about $4 a month available to put towards infrastructure. I think this could all be hosted on one DigitalOcean droplet of the smallest size, which is $5/mo. AWS, Google cloud platform, and other services should be explored too. Though I don't have as much experience with them they could work out too.

Questions for others

Does the plan I've put together sound sane? Have I forgotten anything?
Do you have any suggestions to make Backstroke more maintainable, or easier to debug?
How would you deploy Backstroke for as cheap as possible?

❤️ A thanks to all users - Backstroke has been a fun project to grow over the past year and a half. I hope we can make it better together!

Ryan Gaus, @1egoman

A number of users who have reported issues or commented on issues that may have opinions on these changes: @evandrocoan @thtliife @gaearon @eins78 @radrad @jeremypoulter @johanneskoester @m1guelpf

m1guelpf commented 7 years ago

I have a paid instance of deployhq.com, so if you want I can "donate" the deploy tool and then you can stop worrying about that :smile:

1egoman commented 7 years ago

@m1guelpf Are you talking about for the current deployment, or with the new architecture I'm proposing? I'll do some research this weekend and see it would work be helpful for the current state of affairs (I'm unfamiliar with the service) but I think for the new architecture I'd like to try out some sort of immutable deployment such as Docker (and at a cursory glance, DeployHQ doesn't seem immutable). Thanks!

m1guelpf commented 7 years ago

@1egoman I was talking about the old one. I don't think it supports Docker or any other type of immutable deployment... :sad:

1egoman commented 7 years ago

Cool. I'll do some research this weekend. 😄

1egoman commented 7 years ago

@m1guelpf It doesn't look like DeployHQ works with Heroku, so thank for the offer, but I don't think it'll be helpful for maintaining the current Backstroke version 🙁

eins78 commented 7 years ago

@1egoman It sounds like https://zeit.co/now does what you want? Maybe @rauchg wants to "pitch" in? ;)

1egoman commented 7 years ago

@eins78 Interesting, I'll do some research and see if now could fit Backstroke's needs.

1egoman commented 7 years ago

I love the idea of now, but I'm having trouble deploying this repository.

https://zeit.co/rgausnet/server/xvoeisygwy shows that the container takes a very long time to start, though the logs seem to show that the container actually started previously. Once the container did start, I get repeatedly get 502s: https://server-xvoeisygwy.now.sh/

@rauchg Any help you could provide with this?

Also sent a support email to support@zeit.co:

Hello,

I'm thinking about deploying my open source project Backstroke (https://github.com/1egoman/backstroke) on Zeit. I seem to be having some issues with the deployment.

zeit.co/rgausnet/server/xvoeisygwy shows that the container takes a very long time to start, though the logs seem to show that the container actually started previously. Once the container did start, I get repeatedly get 502s: server-xvoeisygwy.now.sh

For reference, here's the Dockerfile for the container that I'm trying to deploy: https://github.com/backstrokeapp/server/blob/master/Dockerfile, and here's the repository: https://github.com/backstrokeapp/server.

Thanks for the help!
Ryan

1egoman commented 7 years ago

Update: I've secured the domain backstroke.co. My hope is use this domain name instead of backstroke.us for the new deployments as it makes maintaining backward compatibility easier. Any GET requests to backstroke.us would 301 redirect to backstroke.co, and any POST requests would be handled in the old fashion. Here's an updated diagram of the architecture:

layout

Also, I've mostly finished work on the legacy backstroke service (now to be hosted at backstroke.us. The code can be found here.

1egoman commented 7 years ago

I've spent the last week or so writing a deployment script for Backstroke. My current plan is to host all services on a DigitalOcean droplet, with each service running within docker. In the near term, I plan to use docker-compose to spin up all services on server start since I'm not too concerned with scaling right off the bat. (If I want to scale the service further, I might try nomad.)

My goal was to run all these services on the smallest size droplet (1 core, 512mb ram) but it looks like that is going to be near impossible. Between haproxy, docker and two node processes, the instance runs of of memory within a couple minutes. I'm now running on the 2nd-smallest droplet (2 core, 1gb ram) and I can run docker, redis, and three node processes (worker, server, and legacy) with about ~150mb of ram left over.

I'd prefer to rely on a third party service for hosting the database rather than do it myself, though depending on cost it may make sense for me to just figure it out on my own. Currently, I'm relying on a Heroku free-tier database with a 10,000 row limit (and linking to it externally from the server container) but this is far from optimal.

Unfortunately, this means that I'm going to be spending a bit of of pocket for now - hopefully this new version will gather some more Gratipay donations and can be self sufficient!

While the deployment scripts aren't ready to open source (of course, with secrets redacted), I'll post a link once they are ready. All the existing services in the diagrams above are also now hosted at backstroke.co - api.backstroke.co, app.backstroke.co, and backstroke.co. The legacy service is also hosted at legacy.backstroke.us, though this will eventually be aliased to backstroke.us. Feel free to try them out - I'd love to get feedback on all the work I've been doing over the past 6 months.

I'm hoping these updates provide transparency into how Backstroke's upcoming release is shaping up. Are these helpful? Thanks for using Backstroke! ❤️

1egoman commented 7 years ago

I finished up the deployment repository. https://github.com/backstrokeapp/deployment

1egoman commented 7 years ago

A number of helpful things have happened since the last update:

I was able to secure a sponsorship from DigitalOcean! For the next year, at least, hosting shouldn't be an issue. There's now a nice sponsorship note on the new website. Also, this means that I've upgraded to a 2gb droplet, on which the systems perform much better.
A complete local development environment has been set up in the https://github.com/backstrokeapp/deployment repository. Now, with one command, a near-replica of the production system can be set up on a local computer. There's a step-by-step list of what's required to do this.
I built a small tool to help visualize the entire system when link updates are flowing through. While it's not ready to release, it's tremendously helpful in local development / gaining of an understanding of what's actually going on.
Fixed a couple bugs in the webhook job. Running on a current replica of production, ~80% of links are able to be processed and executed on by the worker. I suspect the last ~20% are due to a couple factors that haven't been considered:
- Repositories being deleted that are in a link, and the worker not taking this into account.
- Branches being deleted that the link was supposed to pull changes from or propose changes into.
Fixed an issue in which a worker handing a number of links could potentially exhaust the gthub token's rate limit. Unfortunately, this means it can sometimes take up to 2 minutes or so for a link operation to process, but I think that's an adequate target for now.

I'm nearly ready to release this thing. I'm a bit worried that once it's released, I will have forgotten to verify an edge case and I'll get an angry issue, but I think I just need to bite the bullet. My goal is to release this new stuff by next weekend.

1egoman commented 7 years ago

Pre-deployment

Run tests on all projects
- [x] worker
- [x] legacy
- [x] server
Try out some backstroke classic links (both on upstream, and on fork)
- [x] On Upstream
- [x] On Fork
- [x] Ensure that on a fork that has an issue tagged optout doesn't get a PR.
[x] Set up 2FA on digitalocean

Deployment checklist

[x] Push up all backstrokeapp repos.
[x] Build containers for all repos.
[x] Tag all with v2.0.0. Ensure package.json in all also says 2.0.0 in all projects.
[x] Push up all backstrokeapp/server code to 1egoman/backstroke.
[x] Remove backstrokeapp/server.
[x] Transfer 1egoman/backstroke to backstrokeapp/server.
[x] Delete droplet. Redeploy just to make sure we're in a clean state.
[x] Migrate all data from mongo into new database
[x] Make sure legacy works
[x] Update backstroke.us dns:
- backstroke.us should point to the droplet.
- www.backstroke.us should also point to the droplet. (redirect happens in haproxy).

Verify

Verify all these redirect to https://backstroke.co:
- [x] https://backstroke.us
- [x] https://www.backstroke.us
- [x] http://backstroke.us
- [x] http://www.backstroke.us
- [x] http://www.backstroke.co
- [x] https://www.backstroke.co
- [x] http://backstroke.co
[x] Ensure that the new website displays at the apex of backstroke.co.
[x] Attempt to login. Verify that you get dropped at the dashboard in a logged-in state.
[x] Attempt to create a link. Use test upstream and fork. Verify that link can be created.
[x] Sync link manually, ensure that works.
[ ] Push up tagged docker containers.

Take down old stuff (do this once sure that the new stuff is stable)

[x] Securely erase the (now) old mlab database.
[x] Take down the backstroke heroku dyno.
[ ] ~Export data from mixpanel and delete it since its no longer being used.~ It's still being used.

1egoman commented 7 years ago

The deployment happened at 3pm EDT on October 7th, 2017. The service was down from 3pm to 3:10pm.

I'm glad that this new stuff is finally deployed. Over the next week or so I expect for a few issues to come in with scenarios that I didn't take into account when working on the new stuff, but all in all, I'm pretty satisfied with this release.

Dashboard: https://github.com/backstrokeapp/dashboard/releases/tag/v2.0.0 Server: https://github.com/backstrokeapp/server/releases/tag/v2.0.0 Legacy: https://github.com/backstrokeapp/legacy/releases/tag/v2.0.0 Worker: https://github.com/backstrokeapp/worker/releases/tag/v2.0.0 Deployment: https://github.com/backstrokeapp/dashboard/releases/tag/v2.0.0

In a few weeks, I'll complete the migration by taking down all the old stuff on Heroku, and close this issue.

backstrokeapp / server

RFC: Backstroke Migration #66

Current System Architecture

Serious problems with the current approach

Rewrite plan

How I'm planning on fixing the serious problems:

Deployment

Questions for others

Pre-deployment

Deployment checklist

Verify

Take down old stuff (do this once sure that the new stuff is stable)