v1.8.2 cannot mount MULTI_NODE_MULTI_WRITE over NFS

democratic-csi / democratic-csi

csi storage for container orchestration systems

MIT License

888 stars 79 forks source link

v1.8.2 cannot mount MULTI_NODE_MULTI_WRITE over NFS #287

Closed marcusaram closed 1 year ago

marcusaram commented 1 year ago

Hello,

After updating my cluster I had Pod that won't come up because of issues with democratic-csi driver. I just found out that Helm is using democratic-csi:latest image, so I couldn't figure out what went wrong because no changes where made to CSI drivers. That should be bound to Helm releases or something (or documented somewhere).

But by default with ZFS NFS ReadWriteMany PVC don't work. I saw in the changelog and source that there is an option to set the access_modes, but that is causing a issue when it is not set.

I just got keeping MountVolume.MountDevice failed for volume "pvc-32b30e59-1fac-4c49-9bc0-f58a1e7e626c" : rpc error: code = InvalidArgument desc = invalid capability: invalid access_mode, MULTI_NODE_MULTI_WRITER errors.

After downgrading to v1.8.1 it all works again. Is there some documentation source? Really missing that btw.

Keeping up the good work!

travisghansen commented 1 year ago

Oops! Can you send over your cleansed config so I know what driver you’re using?

marcusaram commented 1 year ago

Hi Travis, we're using Truenas (Freenas) driver from the begin (2 years already if 'm correct) with almost the same config, the only change we had was datasetPermissionsUser was from username to the UID value. Here is our config.

driver: freenas-nfs
httpConnection:
  allowInsecure: true
  apiKey: xx
  host: 10.10.x.x
  port: 80
  protocol: http
  username: root
instance_id: oxar-nas
nfs:
  shareAlldirs: false
  shareAllowedHosts: []
  shareAllowedNetworks:
  - 10.4.0.0/24
  shareHost: 10.10.x.x
  shareMapallGroup: ""
  shareMapallUser: ""
  shareMaprootGroup: wheel
  shareMaprootUser: root
sshConnection:
  host: 10.10.x.x
  password: xx
  port: 22
  username: csi
zfs:
  cli:
    sudoEnabled: true
  datasetEnableQuotas: true
  datasetEnableReservation: false
  datasetParentName: pool1/k8s/vols
  datasetPermissionsGroup: 0
  datasetPermissionsMode: "0777"
  datasetPermissionsUser: 0
  detachedSnapshotsDatasetParentName: pool1/k8s/snaps

travisghansen commented 1 year ago

Yeah ok. I’ll get to the bottom of that and snap a new release.

travisghansen commented 1 year ago

Fix incoming: https://github.com/democratic-csi/democratic-csi/pull/288

Stupid mistake :(

heilerich commented 1 year ago

I ran into the same issue with the default latest tag from the helm chart. Setting the values for .node.driver.image and .controller.driver.image to the v1.8.1 image fixed the errors.

marcusaram commented 1 year ago

I see that v1.8.3 is already released. :partying_face: Thank you!

travisghansen commented 1 year ago

Yes, please try it out and let me know if the issue is gone. Thanks!

marcusaram commented 1 year ago

I'll test this on a test environment, since these are ephemeral it will cost me little more time. I update to v1.8.3 on this specific cluster it in our next maintenance window, that will be Tuesday, April 11th. I'll let you know if it's working as expected.

zanehala commented 1 year ago

Was hitting the same issue. Just pinned the node and controller driver image version to v1.8.3 and can confirm the issue is fixed.

marcusaram commented 1 year ago

Hi Travis, took some time because we had some other maintenance, but we rolled out version 1.8.3 on our production cluster and can confirm that the issue is solved. Thank you for your efforts.