NetApp / trident

Storage orchestrator for containers
Apache License 2.0
762 stars 222 forks source link

trident mounts /dev/dm-n instead of /dev/mapper/mpathn #770

Closed domruf closed 2 years ago

domruf commented 2 years ago

Describe the bug I noticed that on my k8s nodes, some volumes are mounted via paths like /dev/mapper/mpathd and some are mounted by paths like /dev/dm-2. I can't see any pattern in which case /dev/mapper/mpathn and which case /dev/dm-n is use. I have only one storage class for san volumes at the moment.

I'm not really sure if this is actually a problem or what is the real difference between the 2, but I read at https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/dm_multipath/mpath_devices#:~:text=Any%20devices%20of%20the%20form%20/dev/dm%2Dn%20are%20for%20internal%20use%20only%20and%20should%20never%20be%20used. and https://ubuntu.com/server/docs/device-mapper-multipathing-introduction#:~:text=Any%20devices%20of%20the%20form%20/dev/dm%2Dn%20are%20for%20internal%20use%20only%20and%20should%20never%20be%20used%20directly. Any devices of the form /dev/dm-n are for internal use only and should never be used. So this made me a bit worried.

Can somebody tell me if this is something that must be fixed or assure me that this is fine this way?

Environment Provide accurate information about the environment to help us reproduce the issue.

To Reproduce Use multiple SAN volumes in pods and then look at the output of df on your nodes.

# df -h | grep /dev
udev                                                                                  7.8G     0  7.8G   0% /dev
/dev/mapper/vg01-root                                                                  98G   42G   52G  45% /
tmpfs                                                                                 7.9G   16K  7.9G   1% /dev/shm
/dev/sda2                                                                             974M  219M  688M  25% /boot
/dev/loop0                                                                             87M   87M     0 100% /snap/core/4917
/dev/loop1                                                                             88M   88M     0 100% /snap/core/5662
/dev/mapper/mpathd                                                                     49G   76K   47G   1% /var/lib/kubelet/pods/0768b458-5aeb-4c5e-b08c-527915adfb27/volumes/kubernetes.io~csi/pvc-4ce1178a-24a3-4e18-b035-5eb9cafd82f6/mount
/dev/dm-1                                                                              30G   20G  8.7G  69% /var/lib/kubelet/pods/4ed48ebd-0508-4ee2-8576-9ae78c0fcc2a/volumes/kubernetes.io~csi/pvc-cae522a8-43aa-4f14-aa61-e2f7d3ca6c4f/mount
/dev/dm-2                                                                              20G  4.2G   15G  23% /var/lib/kubelet/pods/3590e859-37a5-4b2e-a0f9-f343cf870abf/volumes/kubernetes.io~csi/pvc-74105c45-13ba-4ca9-bfea-1e6043780736/mount

Expected behavior Only /dev/mapper/mpathn paths should be used.

domruf commented 2 years ago

Here are some additional information about the setup of the nodes. The /etc/multipath.conf is the same as described at https://docs.netapp.com/us-en/trident/trident-use/worker-node-prep.html#iscsi-volumes

# cat /etc/multipath.conf
defaults {
user_friendly_names yes
find_multipaths no
}

AFAIK /etc/iscsi/iscsid.conf is still the way the package manager has created it

# cat /etc/iscsi/iscsid.conf | grep -v "^#" | grep -v "^$"
iscsid.startup = /bin/systemctl start iscsid.socket
node.startup = manual
node.leading_login = No
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
node.conn[0].iscsi.MaxXmitDataSegmentLength = 0
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.session.nr_sessions = 1
node.session.iscsi.FastAbort = Yes
node.session.scan = manual
rohit-arora-dev commented 2 years ago

@domruf

A multipath device is created as /dev/dm-N whereas the /dev/mapper/mpathX is a symlink with a /dev/dm-N target.

The reason Linux vendors caution against using /dev/dm-N esp. in scripts is because the 'N' number is assigned dynamically and changes with the sequence of how devices are activated during reboot. One of the benefits of using /dev/mapper/mpathX devices is that their names are the same across the reboots. But in the case of Trident, if a node reboots it is Trident's responsibility to re-establish the iSCSI sessions (that is why we recommend setting node.session.scan to manual) then identify the dm-N device and mount it to a pod. From a user's perspective, it should not make any difference at all.

Also, Trident does not use the multipath device for operations pertaining to the creation of logical volume, the operations it performs using the /dev/dm-N device during NodeStaging, NodePublishing and NodeExpansion are:

  1. Formatting the LUN with the desired FS type.
  2. Repair volume
  3. Mounting the device to a pod
  4. Resize the filesystem

We do not see any issue with using the /dev/dm-N device when performing the above operations.

domruf commented 2 years ago

@ntap-arorar I still wonder why sometimes dm-N and sometimes mpathX is used. But if you are sure, using dm-N will be no problem, I guess the issue can be closed.