DoSomething / infrastructure

🐄 DoSomething.org's infrastructure, managed by Terraform.
MIT License
3 stars 2 forks source link

Simplify front-end (Phoenix & Ashes) Fastly configs. #38

Closed DFurnes closed 5 years ago

DFurnes commented 6 years ago

~BUG~ REQUEST

Current Behavior

We simplified a lot of our Fastly config in #39, and moved the majority of properties into Terraform so that we can track changes in code. The two outstanding services that still need to be cleaned up are DoSomething.org (the O.G), and thor.dosomething.org.

These contain routing rules for Phoenix & Ashes, including relatively sizable dictionaries for redirects and backend assignments. As we've moved more stuff into Phoenix, this has created an increasing workload (for devops & now the product team) since every new URL needs to be manually assigned.

Desired Behavior

First of all, let's move these properties into Terraform!

Since we're creating the majority of (…or all??) new content on Phoenix, I'd like to flip the "default" backend to that application. This should remove all the work of pool assignments in one fell swoop.

I'd also like to see if we can simplify redirect logic, since currently PMs need to create redirects for every distinct URL that a user may visit (including query strings, like UTMs!). In nearly every case, we only really care about the path when creating a URL redirect.

Relevant Screenshots + Links

N/A

DFurnes commented 6 years ago

Before diving into the actual Fastly work here, I've created #445 to move Nightwing (which hosts redirect.dosomething.org) into Heroku (so that we can feel more confident in that app, and so that we can set up a QA instance for testing changes to how redirects work before pushing them to production).

DFurnes commented 6 years ago

I've added Papertrail logging to our production Fastly config so that we can audit what paths are receiving traffic (to make sure we aren't missing anything when simplifying these routing rules and, soon, killing off Ashes altogether).

DFurnes commented 6 years ago

Alright, I'm back on this post-staff retreat! I've used the logs we've gathered over the past ten days to find the most popular base paths served on www.dosomething.org and which backend they went to:

path backend hits
/sites/default F_Ashes 1786551
/profiles/dosomething F_Ashes 536659
/next/assets F_PN_Heroku 353675
/api/v1 F_Ashes 140095
/profiles/dosomething F_Ashes 122515
/misc/drupal F_Ashes 122099
/us/facts F_Ashes 111303
/sites/default F_Ashes 50004
/us F_Ashes 38174
/us/campaigns/online-registration-drive F_PN_Heroku 37481
/next/signups F_PN_Heroku 30668
/us/campaigns/ready-vote F_PN_Heroku 28430
/us/campaigns/escape-vape F_PN_Heroku 27292
/next/embed F_PN_Heroku 26421
/api/v2 F_PN_Heroku 25893
/us/campaigns/grab-mic F_PN_Heroku 19120
/us/escape-the-vape-guide F_PN_Heroku 18428
/us/about F_Ashes 15763
/us/articles F_PN_Heroku 8931
/misc/drupal F_Ashes 8324
/us/why-im-voting-in-2018-sms F_PN_Heroku 7731
/us/user F_Ashes 5413
/us/search F_Ashes 5039
/us/voter-registration-deadlines-2018 F_PN_Heroku 4730
/robots.txt F_Ashes 4704

Here's the full list of base paths and most common URLs, and the script used to generate these.

All-in-all, we had 3,033,424 requests served by Ashes (83%) and 621,410 by Phoenix (17%). Of course, this isn't an entirely fair comparison since Phoenix uses client-side rendering for subsequent page navigation within a session, and Ashes is also serving a ton of static content out of the /sites/default/files directory that Next instead reads from Contentful, S3, or Rogue. Still, it'll be fun to see this decrease as we swap things over!

DFurnes commented 6 years ago

I'd also like to see if we can simplify redirect logic, since currently PMs need to create redirects for every distinct URL that a user may visit (including query strings, like UTMs!). In nearly every case, we only really care about the path when creating a URL redirect.

These were pretty straightforward and will save Jen & other product folks tons of time. I updated this behavior on the current production Fastly property on Thursday, September 20th:

DFurnes commented 6 years ago

Using the list of paths found above, we can build and test some minimal conditions to choose the right backend. Here's what I've got so far, separated into a few logical groups for readability:

We'll need a rule to route specific slugs to each backend if req.url ~ \/((us|mx|br)\/)?campaigns as well (checking against a table, either editable via the API or just stored in VCL). Any paths that don't match these rules can safely default to Phoenix.

DFurnes commented 6 years ago

Tested those rules on those past ten days of 2xx requests (ignoring campaigns for now), and it catches everything that should go to Ashes! The only things we're missing that used to route to Ashes are the following, which are all unnecessary to create rules for:

grep -Eiv "\/((us|mx|br))\/? " 200s-fastly.txt| grep -Eiv "\/((us|mx|br)\/)?(facts|about|sobre|volunteer|voluntario|reportback|ds\-share\-complete|api\/v1)" | grep -Eiv "\/((us|mx|br)\/)?(admin|file|sites|profiles|misc|user|taxonomy|modules|search|system|themes|node|js)" | grep -Eiv "\/robots\.txt" | grep -Eiv "\/((us|mx|br)\/)?campaigns" | grep -v "F_PN_Heroku" | sort | uniq -c
  15 /CHANGELOG.txt backend=F_Ashes
   5 /charleston backend=F_Ashes
  95 /index.php backend=F_Ashes
 566 /xmlrpc.php backend=F_Ashes
DFurnes commented 6 years ago

I've created a new front-end property in Fastly in https://github.com/DoSomething/infrastructure/commit/3a45636d0a73c29f5d56327e88f88c400e72b87a, which can be tested at qa.dosomething.org. It should handle routing all paths correctly aside from campaigns, which default to Phoenix for now.

DFurnes commented 6 years ago

We can now route individual campaigns slugs to Ashes using a VCL table, introduced in https://github.com/DoSomething/infrastructure/commit/a016dac2acd68119e9b8b76dc6144be5e304a8ee. The final step is filling in that table with more Ashes slugs.

DFurnes commented 6 years ago

I've added the Explore Campaigns page & remaining slugs in https://github.com/DoSomething/infrastructure/commit/680d00fe7805006aec1f0273cfaac79ff2862eb9.

To build the list of Ashes campaign slugs, I took the spreadsheet that Ashley had compiled here, and piped it through a quick shell script to find which ones currently route to Ashes (the spreadsheet column for that was never filled out):

while read -r slug
do curl -I "https://www.dosomething.org/us/campaigns/$slug" | grep -q "F_HAProxy_1[B-D]" && echo $slug
done < campaign-slugs.txt;
DFurnes commented 6 years ago

Next up, adding support for managing redirects on this property via Aurora. The Fastly Terraform provider doesn't yet support managing dictionaries, so we need to do this step manually:

# Create a new version that we can assign our dictionaries to:
curl -X POST -H "Fastly-Key: $FASTLY_API_TOKEN" https://api.fastly.com/service/$FASTLY_SERVICE_ID/version
$VERSION = "<SET FROM CURL REQUEST ABOVE>"

# Create the 'redirects' and 'redirect_types' edge dictionaries:
curl -X POST -H "Fastly-Key: $FASTLY_API_TOKEN" -d 'name=redirects' https://api.fastly.com/service/$FASTLY_SERVICE_ID/version/$VERSION/dictionary
curl -X POST -H "Fastly-Key: $FASTLY_API_TOKEN" -d 'name=redirect_types' https://api.fastly.com/service/$FASTLY_SERVICE_ID/version/$VERSION/dictionary

# Activate the placeholder version we just created:
curl -X PUT -H "Fastly-Key: $FASTLY_API_TOKEN" https://api.fastly.com/service/$FASTLY_SERVICE_ID/version/$VERSION/activate

Then I was able to apply https://github.com/DoSomething/infrastructure/commit/3744f72ba0a226b5b799a5fbd4656fd7dca98555, and swap in the correct service & dictionary IDs on admin-qa.dosomething.org's Heroku config.

Everything looks swell: https://qa.dosomething.org/test-redirect

DFurnes commented 6 years ago

I also copied over some logic for international homepages in https://github.com/DoSomething/infrastructure/commit/b2401fb0f6f8feb93cc939adbe9eae23432ad029.

DFurnes commented 6 years ago

Noticed that the homepage & campaigns regexes were too permissive (for example, the campaigns one also matched /api/v2/campaigns to Ashes). Fixed in https://github.com/DoSomething/infrastructure/commit/1dc1809c4889df7e0c2852a334745a0ae5784114.

DFurnes commented 6 years ago

I've clicked through our core flows (login, signup, post, logout) on both Ashes & Phoenix pages, and verified that API endpoints for both apps seem to work as expected. I'm going to deactivate the thor.dosomething.org property and redirect any traffic on that domain to qa.dosomething.org.

DFurnes commented 6 years ago

Applied that change in https://github.com/DoSomething/infrastructure/commit/98651632b0a91d1a308ad731a3a8574367b7bee6 & verified everything still works.

DFurnes commented 6 years ago

Adding support for www-new.dosomething.org to HAProxy_1B for testing this new config:

+ acl www-new        hdr(host) -i www-new.dosomething.org
+ use_backend        www-us-drupal if www-new
DFurnes commented 6 years ago

~Bleh, looks like there's also an internal Varnish layer that needs to updated. Layers on layers! I'm just going to hack this by swapping the Host header in Fastly while we test on www-new..~ Well, there is a second Varnish layer (with two instances!) but that's not what was wrong here. I forgot to set the HAProxy backend for HTTPS traffic as well.

DFurnes commented 6 years ago

Ok, so this is ready to test with production now:

I'd love if folks could click around and see if everything works as expected. If so, I'd like to flip the switch today or tomorrow, since that'll make a pretty immediate dent in the busywork the product team has been doing to support our old property.

cc: @weerd @mendelB @ngjo

DFurnes commented 6 years ago

Just to be clear, this removes the need (and ability) to route specific paths to Phoenix or Next via the backend pool dictionary we'd used on the old property. Instead, we can remove campaign slugs from this list of slugs as we migrate them over, and remove the rules to route facts/, about/, and so on as we move each group of content to Phoenix & Contentful.

DFurnes commented 6 years ago

Once we feel decently confident, the next step is to de-activate the DoSomething.org config, add www.dosomething.org as a domain on the new Terraform: Frontend config, and run through a final round of tests!

DFurnes commented 6 years ago

Added a path-specific override to route the Our Press page to Phoenix in https://github.com/DoSomething/infrastructure/commit/e4d7fd3d8333836acb91f88435f6a93d85dad457, referencing this discussion.

DFurnes commented 6 years ago

Fixed two international campaigns that were sneakily hiding out on /us (due to being made with the wrong original translation) in https://github.com/DoSomething/infrastructure/commit/784309b218897277696b358fa442f118b6969a74. Thanks @ngjo for flagging!

DFurnes commented 6 years ago

I've added request logging to the frontend properties (and the handy X-Origin-Name header) in https://github.com/DoSomething/infrastructure/commit/29a71772e956bfab3382aadd47627e7cbe0f0063 and https://github.com/DoSomething/infrastructure/commit/599cb0d02817ef56a2cef690d2b907b26722d4f4.

DFurnes commented 6 years ago

Routed robots.txt to Ashes on production in https://github.com/DoSomething/infrastructure/commit/15768d37688197c4c501a156f8e101163f4b4a79.

DFurnes commented 6 years ago

I've added rules for fact/ and image/ paths in https://github.com/DoSomething/infrastructure/commit/7773bad892d53593057dc34ccb60d91d8ebe2eee and https://github.com/DoSomething/infrastructure/commit/0f2824edb2dc3ce6158f0ca0c2a1d03ee4e9e4dd. Thanks @mendelB for the find!

DFurnes commented 6 years ago

Okay, we've finished a last round of testing and everything looks good! To push this to production, we'll de-activate the DoSomething.org config (freeing up the www.dosomething.org domain) & add www.dosomething.org as a domain on the new Terraform: Frontend config.

I'll run through a last round of testing once we've made the switch, and keep an eye on this search in Papertrail for 404s to make sure nothing is amiss. Worst case, we can just revert to the old config. :v:

DFurnes commented 6 years ago

Done, and running through core flows on production! Looking good to me. 🙆‍♂️

DFurnes commented 6 years ago

A few smaller issues that I've noticed from looking at 404s on Papertrail after deploying this:

weerd commented 6 years ago

We're getting some requests to GET ///us/us/campaigns (how...?), which default to Phoenix.

@DFurnes Noticed this too. It's a bug in how we try to account for redirecting in the routes file:

going to https://www.dosomething.org/us redirects properly to https://www.dosomething.org/us/campaigns

but going to https://www.dosomething.org/us/ redirects incorrectly to https://www.dosomething.org/us/us/campaigns, thinking it's like a category page or something.

The ending forward-slash ends up triggering the route rule for {category}/{slug}; I think it thinks us is the category... so then it redirects to us/category/slug...

DFurnes commented 6 years ago

Ahh, interesting! 💡 I'm not seeing that on production myself, since we route that /us/ request to Ashes but I see exactly what you're talking about when I use one of the Phoenix app-specific domains.

DFurnes commented 6 years ago

Ooooh, but it does go to Phoenix if I hit that homepage URL with a query string, like https://www.dosomething.org/us/?lets-do-this

weerd commented 6 years ago

Aha! Ok! Nice find!

DFurnes commented 6 years ago

Teamwork!! 🙌

DFurnes commented 6 years ago

Fixed the surprise redirect in https://github.com/DoSomething/infrastructure/commit/eaff307240054eb2f63705b2a16e95466a045bb6 and pushed to QA. So https://qa.dosomething.org/us/?lets-do-this should now route to the Ashes homepage once again!

Gonna apply that fix to production as well next.

DFurnes commented 6 years ago

Alright, applied that fix to production in https://github.com/DoSomething/infrastructure/commit/0499b60596fcb27dae1bbcaf6dca929b11543982!

DFurnes commented 6 years ago

Drupal tries to load the favicon from GET /us/favicon.png, which doesn't exist on Phoenix. Same deal with /us/apple-touch-icon-120x120.png, GET /us/apple-touch-icon.png, and GET /us/apple-touch-icon-precomposed.png.

Update! Tested this on QA in https://github.com/DoSomething/infrastructure/commit/c77bc967015f441bf13082adc5ba44cdd4ab90ab, and it turns out they 404'd on Ashes all along! 💫 Reverting the change since it should be nothin' to worry about.

DFurnes commented 6 years ago

Ah, one more thing – noticed that our routing regexes are case-sensitive, so for example https://qa.dosomething.org/us/CAMPAIGNS/Teens-Jeans doesn't route to the right place. Fixed on dev & QA in https://github.com/DoSomething/infrastructure/commit/4abfd5b7ca95bfa679813d5118a7c78d23b0522e and https://github.com/DoSomething/infrastructure/commit/4021a2d78622a4f9fe8e5ad3c5051f91e92ccec4.

DFurnes commented 6 years ago

And promoted those two fixes to production in https://github.com/DoSomething/infrastructure/commit/38cc0a5ac248b13b14d552cae3442706eb8f1c59 🆙

DFurnes commented 6 years ago

Just noting for posterity, we don't have a sitemap.xml (and hence no rule for it). There's nothing registered for our domain in Google's Search Console & requests to /sitemap.xml have just been 404ing for a long time now (…or forever).

weerd commented 6 years ago

🤔 @DFurnes sitemap.xml came up in the Ashes Migration google doc. @mirie was asking about it and what our plan is for Phoenix. I found a couple libraries for Laravel and a gist with a sample Contentful script to sort-of generate a sitemap XML file, but wasn't entirely sure how we wanted to proceed with generating one on a regular basis automagically...

DFurnes commented 6 years ago

From the discussion in that doc, it looks like this functionality has been broken for a while (originally flagged by Jen and confirmed by Matt). Since nothing reads from this at the moment, seems safe to leave for a future build in Phoenix right?

weerd commented 6 years ago

Yeah, I wasn't sure if it's something we wanted to fix while we're at it, moving forward. I'm fine with postponing for now and address in Q1 2019 or whatevs.

DFurnes commented 6 years ago

Alright, I'm going to mark this as closed! 🔨We can open new issues if we find any other bugs.

Some final stats: