Closed carlosrodlop closed 5 months ago
This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days
Has anyone looked into the provided example? This is issue still not answered either triaged to be removed.
and what do the logs from the EBS CSI driver pod show you?
I will shortly in the net couple of days. Thanks for looking into this @bryantbiggs
@bryantbiggs thanks for your patience :)
Regarding ebs csi driver logs, they are not for ebs-csi-node-windows
because they are in a PENDING
state (ContainerCreating
). The following Kubernetes event is connected to this issue:
(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a0fa2ff62abf7f2fb4b5b2ab7d9db59e11ba913c7b89ed0458cc650abb74701c": plugin type="vpc-bridge" name="vpc" failed (add): failed to parse Kubernetes args: failed to get pod IP address ebs-csi-node-windows-fzp2l: error executing k8s connector: error executing connector binary: exit status 1 with execution error: pod ebs-csi-node-windows-fzp2l does not have label vpc.amazonaws.com/PrivateIPv4Address
From the above description we can say that the issue appears to be related to the Amazon VPC CNI plugin failing to obtain the private IPv4 address for the Windows pod running the EBS CSI driver.
1.- I spoke to @wellsiau-aws about this issue and he pointed me out to this list of prerequisites https://github.com/kubernetes-sigs/aws-ebs-csi-driver/tree/master/examples/kubernetes/windows. Looking at the 4 points, I have my doubts on point 2 and 3. Do I need to add them in the provider main.tf
somehow?
2.- Looking at https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html. I am wondering if https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html is configured correctly by the current configuration or is there something else we need to add for Windows Managed Nodes.
module "eks_blueprints_addons" {
source = "aws-ia/eks-blueprints-addons/aws"
version = "1.15.1"
...
eks_addons = {
aws-ebs-csi-driver = {
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
}
...
}
...
}
3.- I tried to look at the terraform code here https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/blob/main/main.tf to understand what is happening under the scenes but there is not reference to ebs csi driver. Where should we look at the code for troubleshooting?
4.- Has anyone tried to run the *.tf files I provided? The issue is easy to reproduce locally I believe.
Finally, I'm attaching a snapshot of all resource created and status
kubectl get all -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/aws-node-mvxg7 2/2 Running 0 64m
kube-system pod/aws-node-t74hb 2/2 Running 0 64m
kube-system pod/coredns-6777b4b9b9-jh6cf 1/1 Running 0 65m
kube-system pod/coredns-6777b4b9b9-kvbmn 1/1 Running 0 65m
kube-system pod/ebs-csi-controller-66cb49498-n92w2 6/6 Running 0 65m
kube-system pod/ebs-csi-controller-66cb49498-ph2rd 6/6 Running 0 65m
kube-system pod/ebs-csi-node-rhmfq 3/3 Running 0 64m
kube-system pod/ebs-csi-node-windows-fzp2l 0/3 ContainerCreating 0 57m
kube-system pod/ebs-csi-node-xhjzv 3/3 Running 0 64m
kube-system pod/kube-proxy-69njv 1/1 Running 0 64m
kube-system pod/kube-proxy-ghpmf 1/1 Running 0 64m
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/kubernetes ClusterIP 172.20.0.1 <none> 443/TCP 70m
kube-system service/kube-dns ClusterIP 172.20.0.10 <none> 53/UDP,53/TCP,9153/TCP 68m
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/aws-node 2 2 2 2 2 <none> 68m
kube-system daemonset.apps/ebs-csi-node 2 2 2 2 2 kubernetes.io/os=linux 65m
kube-system daemonset.apps/ebs-csi-node-windows 1 1 0 1 0 kubernetes.io/os=windows 65m
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 <none> 68m
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
kube-system deployment.apps/coredns 2/2 2 2 68m
kube-system deployment.apps/ebs-csi-controller 2/2 2 2 65m
NAMESPACE NAME DESIRED CURRENT READY AGE
kube-system replicaset.apps/coredns-6777b4b9b9 2 2 2 65m
kube-system replicaset.apps/coredns-86969bccb4 0 0 0 68m
kube-system replicaset.apps/ebs-csi-controller-66cb49498 2 2 2 m
I would re-visit your configurations, theres a number of mis-configurations. For example:
node.enableWindows = true
per the docsThanks @bryantbiggs for your reply
You are setting a taint on the windows nodes - is there a toleration that matches on the EBS CSI driver?
Nope! Where can I find the accepted inputs for eks_addons
> ebs_driver
. Ideally, I'd like to pass values
with a yaml with the tolerations.
There is not reference to them either https://registry.terraform.io/modules/aws-ia/eks-blueprints-addon/aws/latest?tab=inputs neither https://aws-ia.github.io/terraform-aws-eks-blueprints-addons/main/
OK, I guess I can do something like this https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/3e9e5a13e7afee42d4b64874ba5adf73f329ff30/patterns/karpenter/main.tf#L117
Then adding tolerations like https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L276-L281
Can you confirm my suggestion please?
I don't see where you have set node.enableWindows = true per the docs
Which docs please?
Gotcha I need to enable this section https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L384 using the same approach I explained above
I'm closing this issue it was solved by using node selectors only for Node Pools I want to use EBS CSI driver
module "eks_blueprints_addons" {
source = "aws-ia/eks-blueprints-addons/aws"
#vEKSBpAddonsTFMod#
version = "1.15.1"
...
eks_addons = {
aws-ebs-csi-driver = {
service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
configuration_values = jsonencode(
{
node = {
nodeSelector = {
ebs_driver = "enabled"
}
}
}
)
}
...
}
Description
Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration (see the
examples/*
directory for references that you can copy+paste and tailor to match your configs if you are unable to copy your exact configuration). The reproduction MUST be executable by runningterraform init && terraform apply
without any further changes.If your request is for a new feature, please use the
Feature request
template.⚠️ Note
Before you submit an issue, please perform the following first:
.terraform
directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!):rm -rf .terraform/
terraform init
Versions
Module version [Required]: 1.15.1
Terraform version:
Terraform v1.6.6 on linux_amd64
Provider version(s):
Terraform v1.6.6 on linux_amd64
Reproduction Code [Required]
Considerations:
Steps to reproduce the behavior:
main.tf
provider.tf
Expected behaviour
EBS CSI Driver is deployed correctly
Actual behaviour
EBS CSI Driver is NOT deployed
Terminal Output Screenshot(s)
Additional context
eks_managed_node_groups
>mg_windows
section the ebs csi driver is deployed correctly.WINDOWS_CORE_2022_x86_64
ami type but it didn't work.