Closed GeorgianaElena closed 6 days ago
This is failing to create the two largest gpu instances, so could be a quota issue?
I tried to increase a quota request but I think it was denied
@consideRatio do you mind taking a look when you have 5 mins?
Looking at the AWS console, under cloudformation -> stacks, I find one stack representing the node group failing to be created. The error I spot looking at events when it was to be created sais:
Resource handler returned message: "The maximum number of rules per security group has been reached. (Service: Ec2, Status Code: 400, Request ID: a30358b7-aba4-4cd4-a0c7-76eb1d89618c)" (RequestToken: 92fc5773-752c-49b4-bedd-aed51481749e, HandlerErrorCode: ServiceLimitExceeded)
I'm not sure what these security group rules relate to, but I imagine its related to having very many separate node groups in a k8s cluster and there is a need for more and more rules due to that, and this broke things. I figure the next step is to google that error message and try to figure out what its really about. Axel demands my attention currently though, so dropping the ball here for now.
Thanks @consideRatio - tbh, I was expecting an answer tomorrow 😅 appreciated!
So I went back to this today and added node-purpose tags, and suddenly no more errors 🤷🏻♀️
Having difficulty creating a couple of nodegroups on jmte