Closed zorrofox closed 2 years ago
Hi @zorrofox, thanks for using Kubeflow on AWS. aws-secrets-sync
deployment is supposed to create the mysql-secret
and ml-pipeline-minio-artifact
and I see that the corresponding pod is not in Running
status. Can you describe the pod to see if there is an error?
get the pod id by using the following command:
kubectl get pods -n kubeflow | grep "aws-secrets-sync"
use the pod id to check the state:
export POD_ID=<pod-id-here>
kubectl describe pod -n kubeflow $POD_ID
kubectl logs -n kubeflow $POD_ID -c <container-name>
@surajkota thanks a lot for your help!
The aws-secrets-sync
pod deployment info:
kubectl describe -n kubeflow pod aws-secrets-sync-78bf8674fd-j8nxr
Name: aws-secrets-sync-78bf8674fd-j8nxr
Namespace: kubeflow
Priority: 0
Node: ip-192-168-55-104.us-west-2.compute.internal/192.168.55.104
Start Time: Wed, 22 Jun 2022 21:37:07 +0800
Labels: app=aws-secrets-sync
istio.io/rev=default
pod-template-hash=78bf8674fd
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=aws-secrets-sync
service.istio.io/canonical-revision=latest
Annotations: kubectl.kubernetes.io/default-logs-container: secrets
kubernetes.io/psp: eks.privileged
prometheus.io/path: /stats/prometheus
prometheus.io/port: 15020
prometheus.io/scrape: true
sidecar.istio.io/status:
{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["istio-envoy","istio-data","istio-podinfo","istio-token","istiod-...
Status: Pending
IP:
IPs: <none>
Controlled By: ReplicaSet/aws-secrets-sync-78bf8674fd
Init Containers:
istio-init:
Container ID:
Image: docker.io/istio/proxyv2:1.9.6
Image ID:
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15090,15021,15020
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Environment:
AWS_DEFAULT_REGION: us-west-2
AWS_REGION: us-west-2
AWS_ROLE_ARN: arn:aws:iam::975230531453:role/eksctl-kubeflow-workshop-addon-iamserviceacc-Role1-148SO2I187UW4
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-secrets-manager-sa-token-grl4m (ro)
Containers:
secrets:
Container ID:
Image: public.ecr.aws/xray/aws-xray-daemon:latest
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Environment:
AWS_DEFAULT_REGION: us-west-2
AWS_REGION: us-west-2
AWS_ROLE_ARN: arn:aws:iam::975230531453:role/eksctl-kubeflow-workshop-addon-iamserviceacc-Role1-148SO2I187UW4
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/mnt/rds-store from rds-secret (ro)
/mnt/s3-store from s3-secret (ro)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-secrets-manager-sa-token-grl4m (ro)
istio-proxy:
Container ID:
Image: docker.io/istio/proxyv2:1.9.6
Image ID:
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--serviceCluster
aws-secrets-sync.$(POD_NAMESPACE)
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
--concurrency
2
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 10m
memory: 40Mi
Readiness: http-get http://:15021/healthz/ready delay=1s timeout=3s period=2s #success=1 #failure=30
Environment:
JWT_POLICY: third-party-jwt
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod.istio-system.svc:15012
POD_NAME: aws-secrets-sync-78bf8674fd-j8nxr (v1:metadata.name)
POD_NAMESPACE: kubeflow (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
CANONICAL_SERVICE: (v1:metadata.labels['service.istio.io/canonical-name'])
CANONICAL_REVISION: (v1:metadata.labels['service.istio.io/canonical-revision'])
PROXY_CONFIG: {"tracing":{}}
ISTIO_META_POD_PORTS: [
]
ISTIO_META_APP_CONTAINERS: secrets
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_METAJSON_ANNOTATIONS: {"kubernetes.io/psp":"eks.privileged"}
ISTIO_META_WORKLOAD_NAME: aws-secrets-sync
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/kubeflow/deployments/aws-secrets-sync
ISTIO_META_MESH_ID: cluster.local
TRUST_DOMAIN: cluster.local
AWS_DEFAULT_REGION: us-west-2
AWS_REGION: us-west-2
AWS_ROLE_ARN: arn:aws:iam::975230531453:role/eksctl-kubeflow-workshop-addon-iamserviceacc-Role1-148SO2I187UW4
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kubeflow-secrets-manager-sa-token-grl4m (ro)
/var/run/secrets/tokens from istio-token (rw)
Conditions:
Type Status
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
limits.cpu -> cpu-limit
requests.cpu -> cpu-request
istio-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 43200
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
s3-secret:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: secrets-store.csi.k8s.io
FSType:
ReadOnly: true
VolumeAttributes: secretProviderClass=s3-secret
rds-secret:
Type: CSI (a Container Storage Interface (CSI) volume source)
Driver: secrets-store.csi.k8s.io
FSType:
ReadOnly: true
VolumeAttributes: secretProviderClass=rds-secret
kubeflow-secrets-manager-sa-token-grl4m:
Type: Secret (a volume populated by a Secret)
SecretName: kubeflow-secrets-manager-sa-token-grl4m
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 10m (x451 over 11h) kubelet (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[s3-secret rds-secret], unattached volumes=[istio-token kubeflow-secrets-manager-sa-token-grl4m istiod-ca-cert aws-iam-token s3-secret istio-envoy istio-podinfo rds-secret istio-data]: timed out waiting for the condition
Warning FailedMount 3m52s (x250 over 11h) kubelet MountVolume.SetUp failed for volume "s3-secret" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers
And the the containers may have any logs output and I can find that the driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers
error.
But I can get the driver:
kubectl get csidriver
NAME ATTACHREQUIRED PODINFOONMOUNT TOKENREQUESTS REQUIRESREPUBLISH MODES AGE
efs.csi.aws.com false false <unset> false Persistent 41h
secrets-store.csi.k8s.io false true <unset> false Ephemeral 23h
Can you please run through the troubleshooting steps mentioned in this post: https://aws.amazon.com/premiumsupport/knowledge-center/eks-troubleshoot-secrets-manager-issues/ and verify if the secrets-store-csi-driver pods are in Running
state and deamonset has required DESIRED and CURRENT based on number of nodes in your cluster?
Hi @surajkota ,
Thanks a lot for your help! I just find a ymal deployment file from the secret provider deployed failed due to network issues.
Describe the bug Some pods report missing mysql-secret error, such as this:
Steps To Reproduce
Flow the document to install the Kubeflow.
Expected behavior A clear and concise description of what you expected to happen.
Environment
Screenshots
Additional context Add any other context about the problem here.