Closed mlissner closed 2 years ago
I put this question over on AWS's forum too and got a good response: https://repost.aws/questions/QUv105xDmfQMGUiBbfeYW-iQ/elasticache-shows-network-in-and-out-as-exceeded-but-how
It sort of feels like, yes, maybe we're using a lot of bandwidth, but I'm still wondering why 500Mbps is a lot when the thing is supposed to go "Up to 5Gbps." I added that as a comment on the forum too, we'll see if there's a response.
A couple other observations:
Anyway, let's monitor to see if this helps.
Yeah, scaling up didn't help at all. Probably the next best solution is to figure out what's causing these spikes, I think. I looked around yesterday and couldn't find anything that looked particularly suspicious. It's gotta be some sort of cronjob though, because it's always on the hour and the half. That wouldn't happen for a cache with a 30 minute TTL since such a cache wouldn't be tied to the clock (just to a timer).
The folks at Datamatics are going to try to get an AWS pro to help diagnose this.
I'm going to close this for now. We still get spikes in traffic, but I don't think they're causing any issues that we aren't recovering from gracefully.
One last thing to note is that if we want to monitor our traffic in order to figure out what's causing the spikes, AWS does have an option for this: https://aws.amazon.com/blogs/aws/new-vpc-traffic-mirroring/
That option would also be useful for security, as in https://github.com/freelawproject/courtlistener/issues/1586
I saw some suspicious metrics in our Elasticache instance today and wrote them up here. I'm not sure what's up with this, but it seems not great and might be part of why we keep getting various connection failures:
https://serverfault.com/questions/1105308/elasticache-bandwidth-usage-is-low-but-bandwidth-allowance-exceeded