Nexenta / nexentastor-csi-driver-block

Apache License 2.0
1 stars 1 forks source link

Fails when ISCSI LUN > 255 #14

Closed labilezhu closed 2 years ago

labilezhu commented 2 years ago

Symptom

When ISCSI LUN(Logical unit number) > 255, MountVolume fail.

pod description:

kubectl describe pod data-search-engine-data-1

Events:
  Type     Reason       Age                       From     Message
  ----     ------       ----                      ----     -------
  Warning  FailedMount  3m16s (x5456 over 7d17h)  kubelet  MountVolume.MountDevice failed for volume "pvc-ns-14d4e2a4-e5b0-4002-adc7-xxx" : rpc error: code = Unknown desc = lstat /host/dev/disk/by-path/ip-10.0.xx.xx:3260-iscsi-iqn.xxx-07.com.nexenta:02:xxx-a09e-xx-f91a-xx-lun-392: no such file or directory

nexentastor-block-csi-node driver log:

Aug 18 09:33:07.738 [ERRO] [pool1-xxx-vpod1-wpool1-xxx] [NodeServer] [GetRealDeviceName()] Could not evaluate symlink: /dev/disk/by-path/ip-10.0.xx.xx:3260-iscsi-iqn.2xx5-07.com.nexenta:02:xxx-a09e-xx-f91a-xx-lun-392, err: lstat /host/dev/disk/by-path/ip-10.0.xx.xx:3260-iscsi-iqn.xxx-07.com.nexenta:02:xxx-a09e-xx-f91a-xx-lun-392: no such file or directory

I can see two symbolic link naming pattens in /dev/disk/by-path on the host:

# Decimal LUN patten:
ip-10.0.xx.xx:3260-iscsi-iqn.2xx5-07.com.nexenta:02:xxx-a09e-xx-f91a-xx-lun-102  -> ../../sdgw

# Hex LUN pattern:
ip-10.0.xx.xx:3260-iscsi-iqn.2xx5-07.com.nexenta:02:xxx-a09e-xx-f91a-xx-lun-0x0188000000000000 -> ../../sdkz 

We knows:

hex(0x0188) = dec(392)

I can see that:

when 'LUN < 256':
    LUN in symLink as decimal
else
    LUN in symLink as is hex

Root cause

May be a bug of nexentastor-csi-driver-block: https://github.com/Nexenta/nexentastor-csi-driver-block/blob/58d0124997759f47e82217d02fbf03e1198a4ed5/pkg/driver/nodeServer.go#L442

func (s *NodeServer) ConstructDevByPath(portal, iSCSITarget string, lunNumber int) (devByPath string) {
    return strings.Join([]string{
        "/dev/disk/by-path/ip", portal,
        "iscsi", iSCSITarget, "lun", strconv.Itoa(lunNumber)}, "-")
}

Reference below issues:

https://bugs.launchpad.net/cinder/+bug/1493350

https://github.com/kubernetes/kubernetes/issues/48639

https://github.com/kubernetes/kubernetes/issues/45024

The right logic should looks like:

https://github.com/systemd/systemd/blob/main/src/udev/udev-builtin-path_id.c#L58

/*
** Linux only supports 32 bit luns.
** See drivers/scsi/scsi_scan.c::scsilun_to_int() for more details.
*/
static int format_lun_number(sd_device *dev, char **path) {
        const char *sysnum;
        unsigned long lun;
        int r;

        r = sd_device_get_sysnum(dev, &sysnum);
        if (r < 0)
                return r;
        if (!sysnum)
                return -ENOENT;

        r = safe_atolu_full(sysnum, 10, &lun);
        if (r < 0)
                return r;
        if (lun < 256)
                /* address method 0, peripheral device addressing with bus id of zero */
                path_prepend(path, "lun-%lu", lun);
        else
                /* handle all other lun addressing methods by using a variant of the original lun format */
                path_prepend(path, "lun-0x%04lx%04lx00000000", lun & 0xffff, (lun >> 16) & 0xffff);

        return 0;
}

Ref

https://bugs.launchpad.net/cinder/+bug/1493350 https://github.com/kubernetes/kubernetes/issues/48639 https://github.com/kubernetes/kubernetes/issues/45024 https://access.redhat.com/solutions/702413

Qeas commented 2 years ago

Hi @labilezhu. Thanks for your feedback. This does look like a bug and I was able to reproduce it in our lab. I will include a fix for this in the next release. At the same time, we generally do not recommend to use more than 256 luns per iSCSI target. If you have high amount of volumes, I would highly recommend using automatic target-lun control provided by the driver, by setting dynamicTargetLunAllocation: true in the driver config. You can also use numOfLunsPerTarget to control the amount of luns associated with a target before creating new target. All config options can be found here https://github.com/Nexenta/nexentastor-csi-driver-block/blob/master/README.md#configuration-options.

labilezhu commented 2 years ago

Thanks @Qeas . I will try dynamicTargetLunAllocation

Qeas commented 2 years ago

fixed in https://github.com/Nexenta/nexentastor-csi-driver-block/commit/9ab37845f087766b2a771b4c47fccb42e906dafc