Closed zeman412 closed 4 years ago
Thanks @zeman412 for reporting it.
We will try to recreate. At first blush, it sounds like an implementation bug rather than a documentation issue. Either way, we will get it and update here.
@zeman412 can you post your pvc and pod or deployment yaml?
@zeman412 I was not able to recreate it. I followed your steps:
But it succeeded in reattaching. Do you have nodes up and running that we can see the failure?
@deitch yes those are the steps that raise the problem and I just reproduced the error with the nginx deployment example included in the csi-packet/deploy/demo/demo-deployment.yaml.. And I observed that the problem may not always occur on the first attempt, but if I repeat the procedure for the 2nd/3rd times then the error occurs. Here is the details:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: podpvc
spec:
accessModes:
- ReadWriteOnce
storageClassName: csi-packet-standard
resources:
requests:
storage: 1Gi
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
run: nginx
name: nginx
spec:
replicas: 1
selector:
matchLabels:
run: nginx
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
name: web-server
labels:
run: nginx
spec:
# nodeSelector:
# kubernetes.io/hostname: "10.88.52.141"
containers:
- image: nginx
name: nginx
volumeMounts:
- mountPath: /var/lib/www/html
name: mypvc
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: podpvc
readOnly: false
Then created pvc and deployment:
persistentvolumeclaim/podpvc created
deployment.apps/nginx created
kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-7f545b55b8-slj2r 1/1 Running 0 59s
kubectl delete -f nginx.yaml
deployment.apps "nginx" deleted
# kubectl get pods
No resources found.
# kubectl create -f nginx.yaml
deployment.apps/nginx created
root@ewr1-controller:~/csi-packet# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-7f545b55b8-mms5f 1/1 Running 0 41s
5.2: Second attempt:
# kubectl delete -f nginx.yaml
deployment.apps "nginx" deleted
# kubectl get pods
No resources found.
Now recreate the deployment:
# kubectl create -f nginx.yaml
deployment.apps/nginx created
# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-7f545b55b8-qwqmr 0/1 ContainerCreating 0 17m
It got stuck there:
# kubectl describe pods nginx-7f545b55b8-qwqmr
Name: nginx-7f545b55b8-qwqmr
Namespace: default
Priority: 0
Node: ewr1-m2.xlarge.x86-worker-5/10.99.142.13
Start Time: Mon, 25 Nov 2019 15:27:12 +0000
Labels: pod-template-hash=7f545b55b8
run=nginx
Annotations: <none>
Status: Pending
IP:
Controlled By: ReplicaSet/nginx-7f545b55b8
Containers:
nginx:
Container ID:
Image: nginx
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/lib/www/html from mypvc (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-8gj6m (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
mypvc:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: podpvc
ReadOnly: false
default-token-8gj6m:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-8gj6m
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned default/nginx-7f545b55b8-qwqmr to ewr1-m2.xlarge.x86-worker-5
Warning FailedAttachVolume 18m attachdetach-controller Multi-Attach error for volume "pvc-16b22d0e-2151-4f81-905b-a8163552d2eb" Volume is already exclusively attached to one node and can't be attached to another
Warning FailedMount 64s (x8 over 16m) kubelet, ewr1-m2.xlarge.x86-worker-5 Unable to mount volumes for pod "nginx-7f545b55b8-qwqmr_default(53a82355-4763-4318-b4bd-68c2ab7a8990)": timeout expired waiting for volumes to attach or mount for pod "default"/"nginx-7f545b55b8-qwqmr". list of unmounted volumes=[mypvc]. list of unattached volumes=[mypvc default-token-8gj6m]
Warning FailedAttachVolume 26s (x17 over 18m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-16b22d0e-2151-4f81-905b-a8163552d2eb" : rpc error: code = Unknown desc = error attempting to attach b1b3f63d-526a-4615-8a90-b109ecfff21f to f7c652f0-30d3-471b-89b7-5380c4d6bf27, POST https://api.packet.net/storage/b1b3f63d-526a-4615-8a90-b109ecfff21f/attachments: 422 Instance is already attached to this volume
We are working on performance tuning and automating the whole process, thus, I had to recreate the deployment repeatedly.
Well, if you are using the official demos, I cannot complain it is anything off with your manifests. :-)
I can give it a shot at recreating. But I see you have clusters up. Can I connect to them and look around? If so, send me a DM on Packet Slack (I am deitcher
on there). If not, I can keep trying to recreate. Let me know.
Ah, I got it. Now I can see it. Interesting, will figure this one out.
@deitch sorry for the late reply, I moved to a different project and didn't get a chance to check this out. We teared down the k8s cluster, but I will get back to this task soon and I would love to hear if there is an update or fix regarding this issue. I also had an issue with detaching volume and deleting storage (I see there is a new issue opened for this). If I correctly remember, it also asks for manual verification to delete storage from the UI, which is inconvenient for automating the whole deployment process.
No problem.
We have two updates going in for the issue. The first relates to internals of the CSI itself. That will go through as soon as our CI (actually CD) is fixed. We are having some issues with cross-building the arm64 images.
The second relates to how the Packet API and its backing storage release volumes after the host iscsi is logged out. That is a bit thorny, but we will have something on that soon.
If I correctly remember, it also asks for manual verification to delete storage from the UI, which is inconvenient for automating the whole deployment process.
As far as I know, that is just for UI deletion. API-driven ones do not involve any manual process.
Sounds good, looking forward to the updates.
There are two updates in flight. One is #77 helps with part of this, and another, which will come right after #77 is in, will handle the delete issue.
Should be fixed in #79
After installing the csi driver, I was able to create the volume claim and it worked fine for the first time. However, the problem happens when deleting deployment and trying to mount the existing volume claim.
Mysql deployment was running fine, then I deleted the deployment and recreated it and as shown above myql fails to mount the pvc this time.
I followed the README for packet csi and created the pvc exactly as the example provided.