Trident CSI with ontap-san backend does not login to subset of portals reported by SLM

kincl commented 4 years ago

Describe the bug We are running Trident with an ontap-san backend connecting to a ONTAP cluster with 12 iSCSI data LIFs (one per node as required) without CHAP auth. When we create a PVC and schedule a pod that mounts it on a node with no existing sessions the CSI driver on the node will create sessions to all portal IPs instead of just the data LIFs that SLM reports. When the pod completes and the PVC is unscheduled from the node the CSI driver will only logout of the data LIFs that SLM is reporting which leaves in our case 10 sessions to wrong data LIFs. The next time a PVC needs to mount the CSI driver will see the existing sessions to the wrong data LIFs and will proceed without logging in which results in the second PVC never mounting.

Environment Provide accurate information about the environment to help us reproduce the issue.

Trident version: 20.01.1
Trident installation flags used: -d -n trident
Container runtime: cri-o://1.16.5-1.dev.rhaos4.3.git91157c1.el7
Kubernetes version: v1.16.2
Kubernetes orchestrator: OpenShift v4.3.13
Kubernetes enabled feature gates: none beyond stock OCP
OS: RHEL 7.7
NetApp backend types: ONTAP
Other:

To Reproduce Steps to reproduce the behavior:

ONTAP cluster with SLM and iSCSI
Create PVC with ontap-san backend
Create pod that mounts PVC on host1 (succeeds)
Delete pod
Create same pod again on host1 (fails)
Create new PVC
Create new pod that mounts new PVC on host1 (fails)

Expected behavior A clear and concise description of what you expected to happen.

Remounting a PVC on a host with a new pod should succeed as well as mounting new PVCs on the same host.

More specifically, if we mount a iSCSI PVC we should only login to the portals identified by SLM and we should definitely logout of all sessions that we logged into.

Additional context Add any other context about the problem here.

I believe the issue is at utils/osutils.go#L142, we should be using bkportal instead of portalIps.

Below are logs from the first successful mount on a node with no existing sessions. Note that the GRPC request returns p1 and p2 which are the correct SLM reporting portals and is logged as targetPortals. If I am correct, if we were using CHAP then this would work since the CHAP code correctly uses the SLM reporting portals to connect.

Logs from the CSI driver pod on the first mount:

time="2020-04-22T20:47:48Z" level=debug msg="GRPC call: /csi.v1.Node/NodeStageVolume"
time="2020-04-22T20:47:48Z" level=debug msg="GRPC request: volume_id:\"pvc-c5d1e557-b4a9-4dc1-b22e-4bdf16b996d7\" publish_context:<key:\"filesystemType\" value:\"ext4\" > publish_context:<key:\"iscsiIgroup\" value:\"cluster1-k8s\" > publish_context:<key:\"iscsiInitiatorSecret\" value:\"\" > publish_context:<key:\"iscsiInterface\" value:\"\" > publish_context:<key:\"iscsiLunNumber\" value:\"17\" > publish_context:<key:\"iscsiTargetIqn\" value:\"iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\" > publish_context:<key:\"iscsiTargetPortalCount\" value:\"2\" > publish_context:<key:\"iscsiTargetSecret\" value:\"\" > publish_context:<key:\"iscsiUsername\" value:\"\" > publish_context:<key:\"mountOptions\" value:\"\" > publish_context:<key:\"p1\" value:\"172.30.253.2\" > publish_context:<key:\"p2\" value:\"172.30.253.3\" > publish_context:<key:\"protocol\" value:\"block\" > publish_context:<key:\"sharedTarget\" value:\"true\" > publish_context:<key:\"useCHAP\" value:\"false\" > staging_target_path:\"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c5d1e557-b4a9-4dc1-b22e-4bdf16b996d7/globalmount\" volume_capability:<mount:<fs_type:\"ext4\" > access_mode:<mode:SINGLE_NODE_WRITER > > volume_context:<key:\"backendUUID\" value:\"caf545e2-bd42-4b36-90f9-948c98a87dea\" > volume_context:<key:\"internalName\" value:\"trident_cluster1_block_pvc_c5d1e557_b4a9_4dc1_b22e_4bdf16b996d7\" > volume_context:<key:\"name\" value:\"pvc-c5d1e557-b4a9-4dc1-b22e-4bdf16b996d7\" > volume_context:<key:\"protocol\" value:\"block\" > volume_context:<key:\"storage.kubernetes.io/csiProvisionerIdentity\" value:\"1587583872788-8081-csi.trident.netapp.io\" > "
time="2020-04-22T20:47:48Z" level=debug msg="Attempting to acquire shared lock (NodeStageVolume)." lock=csi_node_server
time="2020-04-22T20:47:48Z" level=debug msg="Acquired shared lock (NodeStageVolume)." lock=csi_node_server
time="2020-04-22T20:47:48Z" level=debug msg=">>>> NodeStageVolume" Method=NodeStageVolume Type=CSI_Node
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.AttachISCSIVolume"
time="2020-04-22T20:47:48Z" level=debug msg="Attaching iSCSI volume." fstype=ext4 lunID=17 mountpoint= targetIQN="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31" targetPortals="[172.30.253.2 172.30.253.3]" volume=trident_cluster1_block_pvc_c5d1e557_b4a9_4dc1_b22e_4bdf16b996d7
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.ISCSISupported"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.execCommand." args="[-V]" command=iscsiadm
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.execCommand." command=iscsiadm error="<nil>" output="iscsiadm version 6.2.0.874-10"
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.ISCSISupported"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.iSCSISessionExistsToTargetIQN"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.getISCSISessionInfo"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.execCommand." args="[-m session]" command=iscsiadm
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.execCommand." command=iscsiadm error="exit status 21" output="iscsiadm: No active sessions."
time="2020-04-22T20:47:48Z" level=debug msg="No iSCSI session found."
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.getISCSISessionInfo"
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.iSCSISessionExistsToTargetIQN"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.EnsureISCSISession" hostDataIP=172.30.253.2
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.ISCSISupported"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.execCommand." args="[-V]" command=iscsiadm
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.execCommand." command=iscsiadm error="<nil>" output="iscsiadm version 6.2.0.874-10"
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.ISCSISupported"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.iSCSISessionExists"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.getISCSISessionInfo"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.execCommand." args="[-m session]" command=iscsiadm
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.execCommand." command=iscsiadm error="exit status 21" output="iscsiadm: No active sessions."
time="2020-04-22T20:47:48Z" level=debug msg="No iSCSI session found."
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.getISCSISessionInfo"
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.iSCSISessionExists"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.iSCSIDiscovery" portal=172.30.253.2
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.execCommand." args="[-m discovery -t sendtargets -p 172.30.253.2]" command=iscsiadm
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.execCommand." command=iscsiadm error="<nil>" output="172.30.253.2:3260,1067 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.13:3260,1094 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.12:3260,1093 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.11:3260,1092 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.10:3260,1091 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.9:3260,1090 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.8:3260,1089 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.7:3260,1088 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.6:3260,1087 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.5:3260,1086 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.4:3260,1085 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31\n172.30.253.3:3260,1084 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.2:3260,1067" PortalIP=172.30.253.2 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.13:3260,1094" PortalIP=172.30.253.13 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.12:3260,1093" PortalIP=172.30.253.12 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.11:3260,1092" PortalIP=172.30.253.11 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.10:3260,1091" PortalIP=172.30.253.10 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.9:3260,1090" PortalIP=172.30.253.9 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.8:3260,1089" PortalIP=172.30.253.8 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.7:3260,1088" PortalIP=172.30.253.7 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.6:3260,1087" PortalIP=172.30.253.6 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.5:3260,1086" PortalIP=172.30.253.5 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.4:3260,1085" PortalIP=172.30.253.4 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="Adding iSCSI discovery info." Portal="172.30.253.3:3260,1084" PortalIP=172.30.253.3 TargetName="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31"
time="2020-04-22T20:47:48Z" level=debug msg="<<<< osutils.iSCSIDiscovery"
time="2020-04-22T20:47:48Z" level=debug msg="Found matching iSCSI targets." Targets="[{172.30.253.2:3260,1067 172.30.253.2 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.13:3260,1094 172.30.253.13 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.12:3260,1093 172.30.253.12 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.11:3260,1092 172.30.253.11 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.10:3260,1091 172.30.253.10 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.9:3260,1090 172.30.253.9 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.8:3260,1089 172.30.253.8 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.7:3260,1088 172.30.253.7 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.6:3260,1087 172.30.253.6 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.5:3260,1086 172.30.253.5 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.4:3260,1085 172.30.253.4 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31} {172.30.253.3:3260,1084 172.30.253.3 iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31}]"
time="2020-04-22T20:47:48Z" level=debug msg=">>>> osutils.loginISCSITarget" IQN="iqn.1992-08.com.netapp:sn.2eaf0424b53311e9922700a098e1f7a7:vs.31" Portal=172.30.253.2

kincl commented 4 years ago

Please let me know what I can do to move this bug along, thanks!