NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

24.02.0 Failed to verify multipath device serial. #941

Open grubjack opened 4 weeks ago

grubjack commented 4 weeks ago

Describe the bug Can't mount PV after trident-operator upgrade

Environment Provide accurate information about the environment to help us reproduce the issue.

Additional context

$ kubectl events
...
20s (x5 over 98s)      Warning   FailedMount              Pod/test-65689b588-kspfv    MountVolume.MountDevice failed for volume "pvc-47c47d19-c076-4b12-84ab-b35cacad7774"
 : rpc error: code = Internal desc = rpc error: 
code = Internal desc = failed to stage volume: multipath device 'dm-76' serial check failed

$ kubectl -n trident logs trident-node-linux-g7p5w
...
time="2024-10-24T10:21:52Z" level=error msg="GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = failed to stage volume: multipath device 'dm-76' serial check failed" logLayer=csi_frontend requestID=42c4607d-1845-4f64-974e-b7886594e323 requestSource=CSI
time="2024-10-24T10:22:32Z" level=error msg="Failed to verify multipath device serial." logLayer=csi_frontend lunSerialNumber="81LgM$V4waSp" lunSerialNumberHex=38314c674d24563477615370 multipathDevice=dm-76 multipathDeviceUUID="mpath-3600a098038314c674d2456347761534b\n" requestID=28be1fb9-aada-456f-b344-7d715190b9cd requestSource=CSI workflow="node_server=stage"
time="2024-10-24T10:22:36Z" level=error msg="Failed to verify multipath device serial." logLayer=csi_frontend lunSerialNumber="81LgM$V4waSp" lunSerialNumberHex=38314c674d24563477615370 multipathDevice=dm-76 multipathDeviceUUID="mpath-3600a098038314c674d2456347761534b\n" requestID=28be1fb9-aada-456f-b344-7d715190b9cd requestSource=CSI workflow="node_server=stage"
time="2024-10-24T10:22:42Z" level=error msg="Failed to verify multipath device serial." logLayer=csi_frontend lunSerialNumber="81LgM$V4waSp" lunSerialNumberHex=38314c674d24563477615370 multipathDevice=dm-76 multipathDeviceUUID="mpath-3600a098038314c674d2456347761534b\n" requestID=28be1fb9-aada-456f-b344-7d715190b9cd requestSource=CSI workflow="node_server=stage"
time="2024-10-24T10:22:42Z" level=error msg="GRPC error: rpc error: code = Internal desc = rpc error: code = Internal desc = failed to stage volume: multipath device 'dm-76' serial check failed" logLayer=csi_frontend requestID=28be1fb9-aada-456f-b344-7d715190b9cd requestSource=CSI

$ cat /etc/multipath.conf
defaults {
    find_multipaths no
    user_friendly_names yes
}

$ sudo multipath -ll | grep -A 5 dm-76
3600a098038314c674d2456347761534b dm-76 NETAPP,LUN C-Mode
size=8.0G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=0 status=active
| `- 15:0:0:20  sdcx 70:80   failed faulty running
`-+- policy='service-time 0' prio=0 status=enabled
  `- 16:0:0:20  sddq 71:128  failed faulty running

$ sudo cat /etc/iscsi/iscsid.conf
iscsid.startup = /bin/systemctl start iscsid.socket
node.startup = manual
node.leading_login = No
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.session.nr_sessions = 1
node.session.iscsi.FastAbort = Yes
node.session.scan = manual
bryantidd commented 3 weeks ago

Seeing the same error on a fresh install. @grubjack Did you find a workaround?

grubjack commented 2 weeks ago

Seeing the same error on a fresh install. @grubjack Did you find a workaround?

https://kb.netapp.com/Cloud/Astra/Trident/Failed_dm-devices_visible_after_Upgrade_of_Trident_from_pre-22.07_to_22.10_or_later

sjpeeris commented 1 week ago

@grubjack Are you able to reproduce the issue with the latest version of Trident (v 24.10.0) ? We had several bug fixes added to 24.10.0. If you are still able to reproduce the issue with 24.10.0, please open a NetApp Support case and our support team will investigate further.