Issue with cache invalidation on production deployment

cscairns commented 9 years ago

Production deployment changes aren't propagating within 1-hour clock.

konklone commented 9 years ago

If you're seeing this consistently now, and you've been correctly setting Cache-Control headers as specified in the deploy command (and it looks like you have), then I think this is a bug in Amazon. Here's how Amazon specifies their caching behavior:

minimum ttl

Here's the output I see from our origin and CloudFront distribution:

$ curl --head http://myra-cloudfront.s3-website-us-east-1.amazonaws.com/
HTTP/1.1 200 OK
x-amz-id-2: KM8CKnoGMyg5VXTFayjycA/8J3pcNrw2Gf/ERREn+UJg1r471EkC62a1hxbei8mi
x-amz-request-id: BC625E015EAA37C7
Date: Thu, 20 Nov 2014 15:41:35 GMT
Cache-Control: max-age=86400
Last-Modified: Wed, 19 Nov 2014 19:22:38 GMT
ETag: "3da6ed097227470fc406119fc63bde01"
Content-Type: text/html
Content-Length: 11132
Server: AmazonS3

$ curl --head https://myra.treasury.gov
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 11132
Connection: keep-alive
Date: Tue, 18 Nov 2014 04:36:43 GMT
Cache-Control: max-age=86400
Last-Modified: Tue, 18 Nov 2014 01:14:45 GMT
ETag: "3da6ed097227470fc406119fc63bde01"
Server: AmazonS3
Age: 76041
X-Cache: Hit from cloudfront
Via: 1.1 c1835ed5f58f5752820118219163da2f.cloudfront.net (CloudFront)
X-Amz-Cf-Id: 3d1FUwB6amjTnSUjuFq2ddvwwAQBfrWOQ8pCckvLzJYxmb9WNeOYQg==

Other pages, like /about/ and /employers/, correctly updated themselves when I hit them with curl just now, but that doesn't necessarily guarantee that caching was working as expected, because I'm jumping in here nearly a day after you last deployed the app.

The main thing is: is there a difference in content between http://myra-cloudfront.s3-website-us-east-1.amazonaws.com/ and https://myra.treasury.gov, or any page on the two, after > 1 hour has passed? I think you're saying there is, which is not the behavior Amazon has described.

To try something else out, I've switched the "minimum TTL" of the distribution to 86400, in CloudFront settings:

caching

I'd like to see if this changes the behavior you observe going forward. However, if it works, I'm also going to want to switch it back off and verify the behavior, because we shouldn't have to do this.

cscairns commented 9 years ago

sounds good...yes, the key issue is that the changes aren't propagating within the 1-hr clock...i have seen it update within a 24-hr period, but that is too long...i have seen instances where some pages update right away, but others such as employers/resources/ aren't reflected until much longer than 1 hour

konklone commented 9 years ago

@cscairns Have you seen this happen since I made the above change, this morning?

konklone commented 9 years ago

cc @seanherron for any insight

seanherron commented 9 years ago

Have you tried issuing a manual invalidation and seeing if everything correctly refreshes?

konklone commented 9 years ago

Yep, we've done that when needed, and the manual invalidation works correctly.

noahmanger commented 9 years ago

@konklone can / should you show me how to manually invalidate?

konklone commented 9 years ago

I can and should! You will need access to CloudFront in AWS. If you don't have access to it, please ask DevOps to grant you access.

I'm unhappy that we haven't found a working cache invalidation solution. I'm close to saying we should write Amazon and tell them there's a bug, because I'm so sure we're doing it right. It's just...exhausting.

konklone commented 9 years ago

@noahmanger and @cscairns - can you verify that you have access to CloudFront now to perform invalidations?

konklone commented 9 years ago

I got confirmation from @noahmanger. @cscairns, can you confirm that you can run invalidations from the CloudFront console?

NoahKunin commented 9 years ago

To echo previous email threads and comms, I do not believe this to be a reproducible bug via the public internet (non-Gov, non-TIC) connections. This is a known problem at Treasury for several years (going back to 2011) and intermittently a problem at GSA, but also to a lesser degree.

If Treasury wants to view the site as the public views it, they will have to access a non-Government internet gateway, like through a mifi or tethered mobile device.

konklone commented 9 years ago

OK, I'm going to close this and view it as solved, until someone can reproduce it on a non-government network.

konklone commented 9 years ago

OK. This is not one of the higher points of my career.

The cache time I've been setting (and which I used in docs that give you the command to paste) is wrong. 86400 seconds is 1 day. We have been trying to set 1 hour. That's 3600 seconds.

cscairns commented 9 years ago

Well, at least it was a simple fix.

18F / myra

Issue with cache invalidation on production deployment #139