badges / shields

Concise, consistent, and legible badges in SVG and raster format
https://shields.io
Creative Commons Zero v1.0 Universal
23.75k stars 5.5k forks source link

Move the frontend to heroku #5014

Closed chris48s closed 3 years ago

chris48s commented 4 years ago

:clipboard: Description

Currently we host the badge server on heroku and the front end on gh-pages.

This was originally done for performance reasons as it kept the traffic on shields.io's homepage(s) off the badge servers. This decision was made before we were behind a CDN.

The downside of this arrangement is that it means we don't benefit from a lot of the nice things that being on heroku buys us (e.g: automated deploys, easy rollbacks, etc).

Now that we are deployed on heroku behind a CDN, it should be possible to serve the front-end from the same dynos that serve the badges with minimal impact on performance. We can lean on the CDN to handle most of the traffic for us.

What would be the steps necessary to move the front-end off GH pages so we can move to fully automated (or one-click) deployment/rollbacks?

paulmelnikow commented 4 years ago

I agree this would be pretty impactful. In addition to the benefits of rollbacking, etc., it makes deployment a one-click operation, rather than a one-click operation plus a git pull && npm ci && make deploy-gh-pages. It also ensures the deploy is pristine, not polluted by anything else that might be unusual in the local dev environment. It's more transparent and secure, too.

One blocking issue is to de-overlap the routes between the badge server and the frontend. I believe the only one which overlaps is https://shields.io/endpoint and https://img.shields.io/endpoint, which can be distinguished only by the presence of a query parameter. I would prefer not to change either of these, to be honest, as there are lots of links pointing to both. Though I would also rather not have to introduce complicated routing to accommodate this. Maybe we could put the exception logic in the endpoint badge? i.e. fall back to serving the generated endpoint html when there is no query parameter. This probably deserves discussion in its own issue.

Other than that… this already basically works on staging! So if we point shields.io to the same app as img.shields.io we're most of the way there.

We should think about whether we want to force badge users to user https://img.shields.io/ vs https://shields.io/, and vice versa. If we decide we want to force users to address the badge and the frontend separately, that could be another way to resolve the /endpoint conflict (i.e. examine the host on the request and route appropriately.) The conflict would remain unresolved in staging and other Heroku deploys but that's not a big deal.

There may be something special that has to be done because it's the zone apex.

Finally, we could think about caching, and what cache headers we want to send (from the badge server) for the frontend, and may need a solution for cache-busting at deploy time.

chris48s commented 4 years ago

I had a bit more of a think about this/play with stuff after our conversation got cut short yesterday.

One of the things I realised (which you're probably way ahead of me on) is that a heroku dyno will only serve traffic on one port, so we can't have the badge server and front end on separate ports (with img.shields.io pointing at one and shields.io pointing at another). As such, I think its inevitable that as we go with this approach shields.io and img.shields.io will essentially become interchangeable and that's difficult to avoid. In fact, I only realised today that https://img.shields.io/category/build totally works now :open_mouth:

There's also this odd situation where https://img.shields.io/endpoint/ serves the endpoint docs (but gatsby makes the url look like https://img.shields.io/endpoint in the browser but going to https://img.shields.io/endpoint directly in a browser doesn't serve the docs) :exploding_head:

As such, I think trying to maintain a distinction between shields.io and img.shields.io is a losing strategy. Unless we go out of our way to make it hard at the routing layer, they will get used interchangeably whether we like it or not and then its equally confusing that they're completely interchangeable except img.shields.io/endpoint does one thing and shields.io/endpoint does something different when every other route works the same on both.

I think given that we probably have to pursue no conflicts between the badge server and endpoint routes. Tbh, given we've only got one conflict (endpoint), I think my first choice would actually just be to move the docs to /endpoint-badge or /endpoint-docs (or whatever) and accept we've broken one link. Its annoying, but its the easiest and cleanest solution. If we can't stomach breaking one link, then I reckon hacking the endpoint service so it redirects to the docs if there is no param is the way to go and we just accept that badge has an annoying implementation.

Another thing I've realised is that I have no idea how to set my local environment up like the heroku staging env with the badge server and frontend being served on the same port. Any advice? If we're going to move to that setup in production, this should probably become the default for local dev too so we can't accidentally introduce more conflicts.

paulmelnikow commented 4 years ago

Another thing I've realised is that I have no idea how to set my local environment up like the heroku staging env with the badge server and frontend being served on the same port. Any advice? If we're going to move to that setup in production, this should probably become the default for local dev too so we can't accidentally introduce more conflicts.

This one I can answer quickly: run npm run build, and then npm run start:server

chris48s commented 4 years ago

Finally, we could think about caching, and what cache headers we want to send (from the badge server) for the frontend

Yeah this is a good point. I started looking at https://github.com/badges/shields/issues/5138 and realised that I have absolutely no idea how/where we would set headers when serving the front-end. We don't serve that content via the legacy request handler and I hit a bit of a dead end trying to chase down how we do serve that static content.

chris48s commented 4 years ago

The place we register the route for serving the frontend is https://github.com/badges/shields/blob/614daef08f0c6831894fef4e4cac23816d3953f3/core/server/server.js#L416-L423 The headers are determined by scoutcamp so we may actually need to patch our scoutcamp fork to have control over this. I will investigate.

paulmelnikow commented 4 years ago

From the top comment in #5574:

Assuming this is the missing piece of the jigsaw for #5014 what are the next steps to actually move to serving everything from heroku?

  1. URLs like https://shields.io/badge/foo-bar-blue are currently 404's – in other words, badges have to be served from img.shields.io. Do we want to try to preserve that behavior, to force users to continue to use img.shields.io and not shields.io? If we start to support badges at shields.io, whether it's documented or not, I worry about locking ourselves into supporting it forever. I think the way to fix this would be to examine the Host header. (This may take some investigation. I can't tell whether this header is actually provided to us.)
  2. It looks like it should be possible to do a CNAME on the apex (shields.io). (Also noted at the bottom of this page.) So maybe next step is to replace the A records on the apex with the CNAME to Heroku and see if it works? Alternatively we could try to get it working on staging first, using a separate apex domain.
chris48s commented 4 years ago

Do we want to try to preserve that behavior, to force users to continue to use img.shields.io and not shields.io ?

I thought we'd already decided not to do that - hence the need for https://github.com/badges/shields/pull/5137 If we're going to continue serving badges on img.shields.io and frontend on shields.io that is unnecessary - there is no conflict to resolve.

If we want to revisit that, I don't think much has changed since I posted https://github.com/badges/shields/issues/5014#issuecomment-629649769 . I do appreciate once we start serving badges on shields.io, we basically can't undo it. My instinct is that once we get to a stage where we've got automated deploys it is unlikely that we'll want to go back to manual/split deployments for the frontend and backend, but you're right it is a risk..

paulmelnikow commented 4 years ago

I think #5137 was a good idea regardless, because by getting rid of the conflict, the problem is solved in staging and self-hosting environments. I guess I haven't quite come to terms with the fact that we're basically going to serve all badges from https://shields.io/.

As such, I think trying to maintain a distinction between shields.io and img.shields.io is a losing strategy. Unless we go out of our way to make it hard at the routing layer, they will get used interchangeably whether we like it or not and then its equally confusing that they're completely interchangeable except img.shields.io/endpoint does one thing and shields.io/endpoint does something different when every other route works the same on both.

I'm less concerned about https://img.shields.io/category/build working than https://shields.io/badge/foo-bar-blue working. I don't think folks are majorly in the habit of deep-linking the frontend (vs badge URLs). But yea. I agree with your analysis.

Unless we go out of our way to make it hard at the routing layer, they will get used interchangeably whether we like it or not

This is the option we could consider then: to do something at the routing layer that prevents badges from being served from one domain (and similarly, preventing static content from being served from the other).

My instinct is that once we get to a stage where we've got automated deploys it is unlikely that we'll want to go back to manual/split deployments for the frontend and backend

I agree with that. From an ops perspective it is much, much, much nicer to host the frontend and backend from a single Heroku app.

Though I guess another question is, imagining we could keep the domains separate at the routing layer, which of these options would we prefer?

  1. img.shields.io remains the canonical domain for badges and shields.io is a second domain which happens to work
  2. shields.io becomes the new canonical domain for badges and img.shields.io is the legacy domain
  3. The server only honors badge requests if they are sent to img.shields.io and frontend requests if they are sent to shields.io
chris48s commented 4 years ago

Though I guess another question is, imagining we could keep the domains separate at the routing layer, which of these options would we prefer?

  1. img.shields.io remains the canonical domain for badges and shields.io is a second domain which happens to work
  2. shields.io becomes the new canonical domain for badges and img.shields.io is the legacy domain
  3. The server only honors badge requests if they are sent to img.shields.io and frontend requests if they are sent to shields.io

I mean, I guess if wass a zero-effort/complexity solution, then we would pick number 3 because it means we can do it and roll back to our existing strategy with no impact, but in reality 3 involves some messyness that we probably don't need to take on. I've been assuming we would do 1, at least to start with. If we don't announce "hey everyone - you can now request badges without img. - its a totally supported feature" that probably gives us enough scope to back out if we hit a showstopper early on.

Can you think of another disadvantage if it becomes possible to request a badge on shields.io, other than it makes it hard to go back to our existing deployment strategy? If the only downside is it makes it hard to do a thing that we both agree we probably don't want to do if we can avoid it, I'm not sure its worth taking on a messy solution although admittedly we haven't really scoped out how messy the mess would be :)

paulmelnikow commented 4 years ago

Can you think of another disadvantage if it becomes possible to request a badge on shields.io, other than it makes it hard to go back to our existing deployment strategy?

No, I can't.

It does mean our badge and frontend routes need to not conflict with each other. I think the best way to do that is to choose future frontend routes that are unlikely ever to be badge routes. That seems easy enough to do, if not completely foolproof.

If we don't announce "hey everyone - you can now request badges without img. - its a totally supported feature" that probably gives us enough scope to back out if we hit a showstopper early on.

Agreed. And it sounds like we're in agreement that, while options 1 and 2 are a little scary, they are not boxing us into a corner operationally.

I think option 1 is a fine short-term solution, though in the long term I'd want to move to one of the other options, for two reasons:

  1. Avoiding ambiguity. We're suggesting people keep a few extra characters in the badge URL… for no reason… which is confusing.
  2. Improving design. Wouldn't it be nice to keep removing unnecessary characters from the badge URLs? We dropped .svg not long ago and dropping img. seems like another meaningful gain.

To be honest, I guess I'm most in favor of going boldly with option 2 …

shields.io becomes the new canonical domain for badges and img.shields.io is the legacy domain

after pausing for a few weeks at option 1 …

img.shields.io remains the canonical domain for badges and shields.io is a second domain which happens to work

to make sure moving the frontend to Heroku doesn't cause any showstoppers.

chris48s commented 4 years ago

Yeah exactly - do option 1 to start with, then move to option 2 once we're totally confident in the approach. I think given how many million badge URLs there are in the wild at this point, I think we need to still allow img. basically forever though :)

paulmelnikow commented 4 years ago

I'm most in favor of going boldly with option 2 …

shields.io becomes the new canonical domain for badges and img.shields.io is the legacy domain

after pausing for a few weeks at option 1 …

img.shields.io remains the canonical domain for badges and shields.io is a second domain which happens to work

to make sure moving the frontend to Heroku doesn't cause any showstoppers.

This is a big decision @badges/shields-core, so wanted to make sure everyone has a chance to weigh in as we move forward.

PyvesB commented 4 years ago

I've given this some more thought overnight.

Option 2 will basically lead to a mixture of shields.io and img.shields.io. Given how many badge URLs are already out there and given that there's generally a fair bit of copy pasting across projects, I'm not even convinced shields.io will ever become the majority, even several years from now. I'm slightly uncomfortable with this approach, given that for any foreseeable future it will lead to inconsistency and probably confuse the occasional user who will wonder what the difference between the two domains is. I don't see any big benefit as a counterpart, apart from saving four characters in new badge URLs generated from the website.

Sticking with Option 1 will keep the code simple on our side, but keep appearances the same for users out there.

chris48s commented 4 years ago

Another task we'll need to do here is update our CloudFlare page rule which currently applies to img.shields.io/*

chris48s commented 3 years ago

We discussed this in the ops meeting. Conclusion: We won't actively block requesting badges on shields.io (without img. but we will document that it might go away).

One remaining job before we do the migration is to review how we populate the URL base. We want the setup to be:

We can't use an env var for this because we've already built the frontend, so it will need tot be done client side.

I will pick this one up.

chris48s commented 3 years ago

OK, so the migration... isn't done.

I moved www. over, so https://www.shields.io/category/build is now getting served from heroku rather than GH pages (same as https://img.shields.io/category/build is).

What I've realised (usefully I realised this before I tried switching the root record for shields.io over so there was no downtime) is that we have redirectUrl set in production https://github.com/badges/shields/blob/d87dfc13a8d1268110c0720ed0d3f267d3d2fa0c/config/shields-io-production.yml#L18 This is the thing that causes the root page on https://www.shields.io/ and https://img.shields.io/ to redirect to https://shields.io/ The problem with this is if we switch the root record over https://shields.io will redirect to itself and become a redirect loop, like this:

Screenshot at 2020-11-28 13-07-58

..so we don't want to do that.

What we ideally need is the ability to conditionally apply the redirect based on the host header (i.e: if the request came in on redirectUrl, serve the frontend, else issue the 302). Unfortunately, the way we "fall though" to the static server is by just not registering a route on / and we register routes on server startup, not on each request. (i.e: the way to not do the redirect is to not register this route https://github.com/badges/shields/blob/d87dfc13a8d1268110c0720ed0d3f267d3d2fa0c/core/server/server.js#L393-L399 ).

Given that, I'm not quite sure what the way forward is here.

Anyone got any other ideas? Bear in mind changes here potentially impact on self-hosting users too.

chris48s commented 3 years ago

Thinking about it, another option would be to unset redirectUrl and do the redirect with page rules in CloudFlare.. we do have 2 free.

PyvesB commented 3 years ago

Thinking about it, another option would be to unset redirectUrl and do the redirect with page rules in CloudFlare.. we do have 2 free.

That would be my preferred option, though I don't feel like having https://img.shields.io/ serve the frontend directly would be the end of the world.

paulmelnikow commented 3 years ago

I think that’s fine. The redirect was more relevant when the badge servers were on OVH and the frontend was hosted on GH Pages. Now that they’ll be in the same place I don’t think it matters much and IMO not worth the complexity.

Good catch, by the way!!!

calebcartwright commented 3 years ago

Thinking about it, another option would be to unset redirectUrl and do the redirect with page rules in CloudFlare.. we do have 2 free.

That would be my preferred option

Ditto

chris48s commented 3 years ago

The switchover is done :tada: This means we now have one-click deploys: "promote to production" now deploys both the badge server and website frontend. There are now some follow-up tasks we need to do to cap off this process:

Anything I've missed there?

paulmelnikow commented 3 years ago

Remove the "commit is/isn't in gh-pages" action (I'm not sure we can replace this with anything. That said, now it is easier to deploy this is probably less useful/important) (in progress: #5886 )

This would be a nice thing to provide for the new deployment process, though I agree it's not very clear how we would do that.

One possibility would be adding an introspection endpoint on the server that allows getting the version number.

Definitely out of scope as a follow-up on the migration, and more of a new project.

chris48s commented 3 years ago

I think we could do it by using the Heroku platform API to get the commit hash associated with the deployed release (and then work where in the repo tree that commit hash is to compare). I'm not convinced it is super high value at this point but if somebody is keen to work on it that is probably the way to go.

calebcartwright commented 3 years ago

I may have missed discussion of this already as I imagine others saw this too, but I did have an email from Heroku yesterday morning about automatic certificate management failing for the prod app and the domain being considered "unsafe".

Did others see this/can I ignore it?

paulmelnikow commented 3 years ago

I think we could do it by using the Heroku platform API to get the commit hash associated with the deployed release

Yea, right, that seems like a smart way to go.

(and then work where in the repo tree that commit hash is to compare).

Once we have the two commits, the badge we built for gh-pages almost works for that:

https://img.shields.io/github/commit-status/badges/shields/f5aa2c8db2c4973d2e319d867f1b487eee2dccdf/e1bae8c18f21c7cf4d55793af383ba1d7fee2fcf.svg?label=deploy%20status

It's not the nicest message, so maybe we'd want to make a variant that lets you override the "yes" and "no" messages.

Though since we also need to fetch the info from the Heroku Platform API and then query against that commit, it might be better to deploy a separate service to handle this.

We might also consider a different approach, one that posts a notification after the commit has been deployed.

I don't have the bandwidth now, though I'd be game to take this on at some point.

chris48s commented 3 years ago

I may have missed discussion of this already as I imagine others saw this too, but I did have an email from Heroku yesterday morning about automatic certificate management failing for the prod app and the domain being considered "unsafe".

Did others see this/can I ignore it?

Yeah its fine now:

Screenshot at 2020-12-01 19-31-08

I did post some writeup here: https://github.com/orgs/badges/teams/shields-ops/discussions/5?from_comment=7#discussion-5-comment-7

I think the point where it was "unsafe" would have been when the validation got stuck on "DNS Verified" and I switched it back to GH pages for a bit. I think it was trying to validate a heroku cert, it validated shields.io was pointed at heroku to start with but then next time it checked shields.io wasn't pointed at heroku so decided it was unsafe to continue.

calebcartwright commented 3 years ago

Gotcha, thanks for the info (I read the post but didn't make that mental connection with the email). Thanks for all the work you've done on this, it's a big win!

chris48s commented 3 years ago

We've now been deployed on heroku for a couple of weeks and all is well, so I deleted the gh-pages branch today :wastebasket: ..and that was the last task left on this issue :tada: