equinixmetal-archive / csi-packet

Kubernetes CSI driver for Equnix Metal, formerly Packet
Apache License 2.0
25 stars 13 forks source link

Failed to Mount #41

Closed clanstyles closed 5 years ago

clanstyles commented 5 years ago

I'm not able to attach

  Warning  FailedMount             4m11s (x10 over 24m)  kubelet, ewr1-worker-1   Unable to mount volumes for pod "wordpress-mysql-67565bd57-tmn7k_default(0f2840ba-e6fc-4f63-8ffa-4c9b050e1203)": timeout expired waiting for volumes to attach or mount for pod "default"/"wordpress-mysql-67565bd57-tmn7k". list of unmounted volumes=[mysql-persistent-storage]. list of unattached volumes=[mysql-persistent-storage default-token-9lcrr]
  Warning  FailedMount             4m (x19 over 26m)     kubelet, ewr1-worker-1   MountVolume.MountDevice failed for volume "pvc-89315e1e-b2fb-4ce8-b31a-4ecd3e594c64" : rpc error: code = Unknown desc = isciadmin discover error, exit status 21
deitch commented 5 years ago

Thanks for opening this @clanstyles.

Two questions:

  1. Are you using the latest version of the images? We made some major improvements in the last week, but haven’t updated the deployment yml yet.
  2. If you are and still are having issues, is there a chance we can login to the host and see the log messages? I could share my ssh public key.

Thanks

clanstyles commented 5 years ago

@deitch I probably don't have the updated containers running. Which images should I update and to what versions?

clanstyles commented 5 years ago

docker.io/packethost/csi-packet quay.io/k8scsi/csi-attacher quay.io/k8scsi/csi-provisioner quay.io/k8scsi/csi-node-driver-registrar

I can't access docker.io/packethost/csi-packet to view what tags are there.

clanstyles commented 5 years ago

I took a stab in the dark and updated the csi-packet container to c72c627 based on the pushed version.

  Warning  FailedMount  93s    kubelet, ewr1-worker-1  MountVolume.MountDevice failed for volume "pvc-89315e1e-b2fb-4ce8-b31a-4ecd3e594c64" : rpc error: code = Unknown desc = metadata error, volume volume-volume not found in metadata
  Warning  FailedMount  38s    kubelet, ewr1-worker-1  Unable to mount volumes for pod "wordpress-mysql-67565bd57-4hm8m_default(b4058ecf-587a-4f14-848f-13a10d4669b3)": timeout expired waiting for volumes to attach or mount for pod "default"/"wordpress-mysql-67565bd57-4hm8m". list of unmounted volumes=[mysql-persistent-storage]. list of unattached volumes=[mysql-persistent-storage default-token-9lcrr]
deitch commented 5 years ago

Yes that is the latest. I don’t know if this is an edge case or not, but we want to figure it out. We are about to cut the first property versioned release, so your timing is excellent.

Can I get onto the cluster, to see logs? Also helpful to get on the host is doing the mounting, as it involves both cluster and host. Happy to share my ssh public key.

If you’re on Packet community slack, I’m at “deitcher”, or email me at avi@packet.com

clanstyles commented 5 years ago

I'll hop on right now!

deitch commented 5 years ago

That was helpful. I know what is going on here. The problem is here where it is taking the VolumeName field from the NodeStageVolumeRequest.Context and treating it like VolumeID, by trying to convert it from an ID to a name, which leads to volume-volume instead of volume-ea270a48.

The irony is that we originally had it the other way, but changed it to pass the CSI sanity tests. If I fix it (which I did in my own test env), it works, but the sanity tests fail. We obviously are missing something obvious, but should have it shortly.

deitch commented 5 years ago

Aha, got it. I expect a PR very soon. Once it is in, will message here.

deitch commented 5 years ago

@clanstyles please see #42 . Once that is merged in, it should automatically close this issue. When that happens, please do the following:

  1. Give it a few minutes for the new images to be pushed out to docker hub
  2. Update your csi manifest to use this latest tag and kubectl apply
  3. To be safe, delete the wordpress-mysql deployment and reapply it
  4. Check that the error is gone
  5. Let us know here either way. If it is not gone, reopen the issue.
clanstyles commented 5 years ago

Hey @deitch I forked the repository, build the image and pushed it to the test cluster.

  Warning  FailedMount             4m15s (x16 over 38m)  kubelet, ewr1-worker-0   Unable to mount volumes for pod "wordpress-mysql-67565bd57-56z8f_default(12ed0e25-7e69-41ae-b939-f62064edbfda)": timeout expired waiting for volumes to attach or mount for pod "default"/"wordpress-mysql-67565bd57-56z8f". list of unmounted volumes=[mysql-persistent-storage]. list of unattached volumes=[mysql-persistent-storage default-token-9lcrr]
deitch commented 5 years ago

Hi @clanstyles ; this definitely is an underlying iscsi issue, not a csi issue. If I go to the host itself:

root@ewr1-worker-0:~# curl -s https://metadata.packet.net/metadata | jq '.volumes'
[
  {
    "ips": [
      "10.144.34.170",
      "10.144.51.208"
    ],
    "name": "volume-2f9f1b6e",
    "capacity": {
      "size": "20",
      "unit": "gb"
    },
    "iqn": "iqn.2013-05.com.daterainc:tc:01:sn:e673ab60e3c8eda3"
  }
]

So we see the two IPs, and then:

root@ewr1-worker-0:~# iscsiadm --mode discovery --type sendtargets --portal 10.144.51.208 --discover
iscsiadm: No portals found

Same with the other IP.

Working it...

deitch commented 5 years ago

Update: the issue is that the initiator name in /etc/iscsi/initiatorname.scsi didn't match the correct one provided by the metadata. We can adjust the config as part of the node-specific CSI startup in the DaemonSet, but then we have a challenge how to restart iscsid after the file has changed.

Still being investigated...

clanstyles commented 5 years ago

Hey I wanted to re-open this. I pulled the container docker.io/packethost/csi-packet:1c039e3 and am now receiving

  Warning  FailedMount             46s (x8 over 112s)   kubelet, ewr1-worker-0   MountVolume.MountDevice failed for volume "pvc-1e0cbc5e-f7f5-4298-8f6b-5e46319b33dd" : rpc error: code = Unknown desc = getMappedDevice error, stat /dev/mapper/volume-e95c04c0: no such file or directory
deitch commented 5 years ago

Yeah, different problem, still working it. Never fails, what passes the tests....

deitch commented 5 years ago

Do you mind pulling the latest tag @clanstyles and validating, please? I know we have an issue with multipath which, I believe, is the last major issue to be resolved, but I want to be positive that it is the only thing blocking you.