Closed cscairns closed 9 years ago
If you're seeing this consistently now, and you've been correctly setting Cache-Control headers as specified in the deploy command (and it looks like you have), then I think this is a bug in Amazon. Here's how Amazon specifies their caching behavior:
Here's the output I see from our origin and CloudFront distribution:
$ curl --head http://myra-cloudfront.s3-website-us-east-1.amazonaws.com/
HTTP/1.1 200 OK
x-amz-id-2: KM8CKnoGMyg5VXTFayjycA/8J3pcNrw2Gf/ERREn+UJg1r471EkC62a1hxbei8mi
x-amz-request-id: BC625E015EAA37C7
Date: Thu, 20 Nov 2014 15:41:35 GMT
Cache-Control: max-age=86400
Last-Modified: Wed, 19 Nov 2014 19:22:38 GMT
ETag: "3da6ed097227470fc406119fc63bde01"
Content-Type: text/html
Content-Length: 11132
Server: AmazonS3
$ curl --head https://myra.treasury.gov
HTTP/1.1 200 OK
Content-Type: text/html
Content-Length: 11132
Connection: keep-alive
Date: Tue, 18 Nov 2014 04:36:43 GMT
Cache-Control: max-age=86400
Last-Modified: Tue, 18 Nov 2014 01:14:45 GMT
ETag: "3da6ed097227470fc406119fc63bde01"
Server: AmazonS3
Age: 76041
X-Cache: Hit from cloudfront
Via: 1.1 c1835ed5f58f5752820118219163da2f.cloudfront.net (CloudFront)
X-Amz-Cf-Id: 3d1FUwB6amjTnSUjuFq2ddvwwAQBfrWOQ8pCckvLzJYxmb9WNeOYQg==
Other pages, like /about/
and /employers/
, correctly updated themselves when I hit them with curl
just now, but that doesn't necessarily guarantee that caching was working as expected, because I'm jumping in here nearly a day after you last deployed the app.
The main thing is: is there a difference in content between http://myra-cloudfront.s3-website-us-east-1.amazonaws.com/
and https://myra.treasury.gov
, or any page on the two, after > 1 hour has passed? I think you're saying there is, which is not the behavior Amazon has described.
To try something else out, I've switched the "minimum TTL" of the distribution to 86400, in CloudFront settings:
I'd like to see if this changes the behavior you observe going forward. However, if it works, I'm also going to want to switch it back off and verify the behavior, because we shouldn't have to do this.
sounds good...yes, the key issue is that the changes aren't propagating within the 1-hr clock...i have seen it update within a 24-hr period, but that is too long...i have seen instances where some pages update right away, but others such as employers/resources/ aren't reflected until much longer than 1 hour
@cscairns Have you seen this happen since I made the above change, this morning?
cc @seanherron for any insight
Have you tried issuing a manual invalidation and seeing if everything correctly refreshes?
Yep, we've done that when needed, and the manual invalidation works correctly.
@konklone can / should you show me how to manually invalidate?
I can and should! You will need access to CloudFront in AWS. If you don't have access to it, please ask DevOps to grant you access.
I'm unhappy that we haven't found a working cache invalidation solution. I'm close to saying we should write Amazon and tell them there's a bug, because I'm so sure we're doing it right. It's just...exhausting.
@noahmanger and @cscairns - can you verify that you have access to CloudFront now to perform invalidations?
I got confirmation from @noahmanger. @cscairns, can you confirm that you can run invalidations from the CloudFront console?
To echo previous email threads and comms, I do not believe this to be a reproducible bug via the public internet (non-Gov, non-TIC) connections. This is a known problem at Treasury for several years (going back to 2011) and intermittently a problem at GSA, but also to a lesser degree.
If Treasury wants to view the site as the public views it, they will have to access a non-Government internet gateway, like through a mifi or tethered mobile device.
OK, I'm going to close this and view it as solved, until someone can reproduce it on a non-government network.
OK. This is not one of the higher points of my career.
The cache time I've been setting (and which I used in docs that give you the command to paste) is wrong. 86400 seconds is 1 day. We have been trying to set 1 hour. That's 3600 seconds.
Well, at least it was a simple fix.
Production deployment changes aren't propagating within 1-hour clock.