IBM / ibm-block-csi-driver

The IBM block storage CSI driver enables container orchestrators, such as Kubernetes and Openshift, to manage the life-cycle of persistent storage
Apache License 2.0
33 stars 25 forks source link

Fails to attach storage with error message 'could not find host by using initiators' #442

Closed stevepoweribm closed 2 years ago

stevepoweribm commented 2 years ago

I trying to configure OCP 4.9.11 running on vSphere 7.0.3.00100 utilising NPIV for my compute nodes against a FlashSystem 5030 array using the v1.8.0 CSI driver. I can successfully create a PVC that provisions a PV and I can see that appear in the storage console.

When I try to attach the PVC to a pod (in this case configuring the openshift-image-registry component) it fails to complete the action . The volumeAttachment record reports Status.Attach Error.Message: rpc error: code = NotFound desc = Host for node: nvme_nqn: , fc_wwns : [''], iscsi_iqn : iqn.1994-05.com.redhat:ca8c232fba was not found, ensure all host ports are configured on storage

Using 'oc logs -f pod/ibm-block-csi-controller-0 ibm-block-csi-controller -n ibm-block-csi-driver' in my namespace I can see the following errors:

2022-01-20 14:28:40,777 DEBUG   [140349823231744] [SVC:153;60050763808120A550000000000000B5] (array_mediator_svc.py:get_host_by_host_identifiers:705) - could not find host by using initiators: nvme_nqn: , fc_wwns : [''], iscsi_iqn : iqn.1994-05.com.redhat:ca8c232fba
2022-01-20 14:28:40,778 ERROR   [140349823231744] [SVC:153;60050763808120A550000000000000B5] (exception_handler.py:handle_exception:36) - Host for node: nvme_nqn: , fc_wwns : [''], iscsi_iqn : iqn.1994-05.com.redhat:ca8c232fba  was not found, ensure all host ports are configured on storage
Traceback (most recent call last):
  File "/driver/controller/controller_server/exception_handler.py", line 44, in handle_common_exceptions_with_response
    return controller_method(servicer, request, context)
  File "/driver/controller/controller_server/csi_controller_server.py", line 249, in ControllerPublishVolume
    lun, connectivity_type, array_initiators = array_mediator.map_volume_by_initiators(volume_id,
  File "/opt/app-root/lib64/python3.8/site-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/opt/app-root/lib64/python3.8/site-packages/retry/api.py", line 73, in retry_decorator
    return __retry_internal(partial(f, *args, **kwargs), exceptions, tries, delay, max_delay, backoff, jitter,
  File "/opt/app-root/lib64/python3.8/site-packages/retry/api.py", line 33, in __retry_internal
    return f()
  File "/driver/controller/array_action/array_mediator_abstract.py", line 19, in map_volume_by_initiators
    host_name, connectivity_types = self.get_host_by_host_identifiers(initiators)
  File "/driver/controller/array_action/array_mediator_svc.py", line 706, in get_host_by_host_identifiers
    raise array_errors.HostNotFoundError(initiators)
controller.array_action.errors.HostNotFoundError: Host for node: nvme_nqn: , fc_wwns : [''], iscsi_iqn : iqn.1994-05.com.redhat:ca8c232fba  was not found, ensure all host ports are configured on storage```

Digging through your code I can see that I'm dying on line 741 of 'ibm-block-csi-driver/controller/array_action/array_mediator_svc.py '.  If I understand the basics of the code correctly, your issuing a 'lshost' on the storage array and then a 'lshost <id>'. and then searching through the result looking for nqn, WWPN or iscsi_name values.  If you find one, you set a connectivity type.  In my case, the code fails to find these keys.

If I run a 'lshost <num>' command I get the following output:
```id 22
name <ocpwork101_fqdn>
port_count 4
type generic
mask 1111111111111111111111111111111111111111111111111111111111111111
iogrp_count 2
status online
site_id
site_name
host_cluster_id
host_cluster_name
protocol scsi
status_policy redundant
status_site all
WWPN 281C000C29000047
node_logged_in_count 2
state active
WWPN 281C000C29000046
node_logged_in_count 2
state active
WWPN 281C000C29000045
node_logged_in_count 2
state active
WWPN 281C000C29000044
node_logged_in_count 2
state active
owner_id 0
owner_name Openshift

Questions

  1. Why is this failing? I don't understand why it can't see the WWPN values clearly showing.
  2. What does CSI driver do when the VMs are running on vSphere? Is it going to attach the LUN to the VM using a vSphere RDM disk or something else?

Thanks for any help you can offer in resolving this issue. SteveP

stevepoweribm commented 2 years ago

After digging about how NPIV on VMWare works AND talking to some experts in this space... VMWare requires all LUN's to be presented using RDM disks. From within the VM itself, it's not aware that it has fibre connection to the storage array and therefore dynamically created LUNs using the CSI driver can't be mapped into the VM when required.

Closing defect.

stevepoweribm commented 2 years ago

Closing defect.