Open den-is opened 7 months ago
How many EC2NodeClasses do you have? The API is aws ssm get-parameter
.
How many EC2NodeClasses do you have? The API is
aws ssm get-parameter
.
I understand that it is aws ssm get-parameter
- I meant I was looking for name of this ResourceLimit in the Resources Limits dashboard, to understand what to increase.
Just 2 simple EC2NodeClasses - for AL2 and AL2023
how often do the logs fire? can you post them here?
how often do the logs fire? can you post them here?
@njtran
I can't post them here.
I have DEBUG Logs enabled.
But there is nothing in logs except, text which I have posted above (I'm excluding usual/info messages)
Error level Reconciler error
happens hundreds of times per hour.
discovering amis from ssm
happened much less but still a dozen times per hour.
That was happening intensively for more than 12h, during peak hours, on non-prod server (but in ACC which maybe had some other apps/tests running)
Including screenshot with funny numbers
IIUC, reconciler error could come from any of the controllers.
If you don't want to post the logs here, if you're willing to open an AWS Support ticket with the info or message me on the kubernetes slack, happy to look at the logs there.
I'm really more curious about how often the discovering amis from ssm
error comes so I can gauge if we're doing more SSM lookups than expected.
So it looks like 4 logs every 10 minutes?
Description
What problem are you trying to solve? One of my clusters has a few hundred nodes, and a several thousand pods. This is autoscaled to a few thousand nodes and many more pods.
I use
amiFamily:AL2
instead of specificamiSelectorTerms: []
With just ~400 nodes, in Karpenter logs I started to see hundreds of such messages:
Also:
How important is this feature to you? I will try to fix the issue by requesting a higher rate limit. But for which SSM service, what is the name of this limit? At the moment of writing, I was not able to identify which SSM service limit corresponds to the above issue.
Feature request Is it possible to add simple counter metrics on how many requests Karpenter is doing to SSM Parameter Store.