Closed Anathema-Odious closed 3 months ago
Hi @Anathema-Odious! Thanks for filing this issue. Could you share your more (redact info as needed) of the CDK details surrounding the node class spec? For example, where does role.role_name
come from and resolve to, and I would also be interested in understanding what the created manifests k8s looks like to better understand the issue so if you can share that it would be helpful.
Hello @andskli The role.role_name is a custom worker role which we need to access different other services with a custom name as per our organization naming standards.
We have observed that even after adding it in the NodeClass, it creates a new role even though I have passed the role in EC2NodeClass. I did not add the role in nodeRole as before in karpenter definition since it created InstanceProfile which conflicts with EC2NodeClass Role which also creates Instance Profile and therefore doesn't work.
https://github.com/aws-samples/cdk-eks-karpenter/blob/main/src/index.ts#L628-L630
We will need to remove either one of the two implementations of InstanceProfile. In case EC2NodeClass already has Role, we will need to see how that can be managed.
created manifests k8s looks
Hi @andskli, I am not sure how to get the generated k8s manifest. But I think the issue that @badaldavda8 pointed out above more or less summarizes this. I will share the manifests as soon as I get my hands on them.
The addEC2NodeClass
method just takes Record<string, any>
and creates the k8s manifest based on your input, and does nothing else. To my understanding, when you then pass spec.role
Karpenter will try to create an InstanceProfile
for this role to attach to the owned nodes, see Karpenter docs.
I think the workaround here would be to create your custom Role and corresponding InstanceProfile outside of the construct, and specify spec.instanceProfile
instead of spec.role
in the NodeClass.
This should have the construct sort out the aws-auth
ConfigMap with your custom Role, and add the IAM policy statement (i.e. PassRole) from L630 to that Role.
Sure, we will try this out.
But I am still thinking if it will modify the IAM Policy statement since we will not be passing any nodeRole at all. It will create the nodeRole and then add it in the PassRole since we are not adding nodeRole in the Props.
You are right and I should've mentioned in my previous reply in the workaround section that you should also pass that role as the nodeRole
@andskli, following your above recommendation to pass in the same nodeRole
, while instantiating the Karpenter object, as the one used in the spec.role
for the node_class_spec, worked. I believe it forced the library to use the same node role for using the iam:PassRole
action.
Below is a snapshot of the cdk diff output.
I didn't need to create an instance profile for this to work.
@Anathema-Odious glad to hear it worked out.
I believe this is a documentation issue for now, as the construct allows free-form NodeClass spec's. Given beta graduation of Karpenter API's, it may make sense to expose more of Karpenter's functionality through this constructs APIs in the future, but is yet to be explored.
We are using this library with version 1.0.3, and we are facing an issue, where karpenter fails to scale up. On looking at the error messages, this is the error that we got:
{"level":"ERROR","time":"2024-01-12T11:35:33.053Z","logger":"controller","message":"Reconciler error","commit":"a70b39e","controller":"nodeclass","controllerGroup":"karpenter.k8s.aws","controllerKind":"EC2NodeClass","EC2NodeClass":{"name":"test-node-class-spec"},"namespace":"","name":"test-node-class-spec","reconcileID":"3d054015-5f0a-4ec2-8f1c-51359aaba044","error":"creating instance profile, adding role \"xxxx-xxxx-test\" to instance profile \"xxxxxx-test_10180655167987829993\", AccessDenied: User: arn:aws:sts::xxxxxxxxxxx:assumed-role/xxxxxxxxxxxxxxxxxxxxxxxxxxx-VURLFdcMJOnq/1705059316694610615 is not authorized to perform: iam:PassRole on resource: arn:aws:iam::xxxxxxxxxxx:role/xxxx-xxxx-test because no identity-based policy allows the iam:PassRole action\n\tstatus code: 403, request id: 6b879c29-3f3c-4084-9fb2-1177b0b1c751"}
xxxx-xxxx-test is the custom that we provide in role field for test-node-class-spec .
On deeper inspection of the policy, a new node role is getting created which has the permission for iam:PassRole, instead of using the custom role.
Below is the construct that we use in our cdk(python) block, for our node class definition
Expected behavior: the library should re-use the provided custom role, instead of defining a new node role.
This was previously resolved, but has crept up again in the version change, probably when the support for Karpenter Beta release was added.