Open bsmith94 opened 4 years ago
This change would cause massive problems for much of OPA software, not just the psm2 library, and I do not believe Intel was consulted on this change. I'm not sure how changing device names to vary by each machine they are installed on makes them more "predictable".
This came up again while testing libpsm2 on debian/bullseye. Agreed that the use of the "predictable device name" instead of hfi1_0 would necessitate a lot of change in the other packages.
I have a udev rule that renames the device as hfi1_0, which resolves this issue. Should that rule be packaged with libpsm2? If so, I will submit a pull request.
@bsmith94 I'm consulting with my co-workers, but I think we all agree that a new udev rule is the preferred route, but I don't think the fix should be in libpms2 because it would affect more than just psm users. The persistent naming change is also going to impact all the command line utilities, the fabric manager, etc. I'm also a bit concerned that a 60-*
prefix might be an issue since the existing udev rules for psm are 40-psm.rules
.
Finally, there's the issue of what impact a new udev rule would have on systems that don't have persistent renaming. It would be hard for us to add the change if it's going to negatively impact the majority of our users.
Right now I see a couple of approaches we could take:
Thoughts?
@bsmith94, sorry for the delay in response, we were discussing internally whether this is a fix we need to include in our release or to file an issue with the linux-rdma maintainers.
After much discussion, we decided that since rdma-core was doing the rename with a user space tool called through a udev rule, that the default udev rule provided by rdma-core needs to change to exclude hfi1 from their rule to rename.
I have filed a patch upstream to the maintainers of linux-rdma to change the default rename behavior. I will update further if and when they accept the patch.
Thanks for the update.
Is there any further update to this issue? We are likely hitting the same problem on our Ubuntu 22.04 machines with the included OPA packages and libfabric.
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_getinfo():523<info>
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_init_prov_info():268<info> TAG60 instance included
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_init_prov_info():281<info> TAG64 instance included
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_init_lib():257<info> PSM2 header version = (2, 2)
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_init_lib():259<info> PSM2 library version = (2, 2)
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_init_lib():262<info> PSM2 multi-ep feature enabled.
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_update_hfi_info():427<info> hfi1 units: total -1, active 0; hfi1 contexts: total 0, free 0
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_update_hfi_info():439<info> Tx/Rx contexts: 0 in total, 0 available.
libfabric:3251:1715881079:ofi_rxd:psm2:core:psmx2_getinfo():536<info> no PSM2 device is active.
opainfo
opap134s0:1 PortGID:0xfe80000000000000:0011750901841a24
PortState: Active
LinkSpeed Act: 25Gb En: 25Gb
LinkWidth Act: 4 En: 4
LinkWidthDnGrd ActTx: 4 Rx: 4 En: 3,4
LCRC Act: 14-bit En: 14-bit,16-bit,48-bit Mgmt: True
LID: 0x00000004-0x00000004 SM LID: 0x00000004 SL: 0
QSFP Copper, 1m FCI Electronics P/N 10142057-2010LF Rev C
Xmit Data: 0 MB Pkts: 761
Recv Data: 0 MB Pkts: 908
Link Quality: 5 (Excellent)
opap59s0:1 PortGID:0xfe80000000000000:0011750901846a7c
PortState: Active
LinkSpeed Act: 25Gb En: 25Gb
LinkWidth Act: 4 En: 4
LinkWidthDnGrd ActTx: 4 Rx: 4 En: 3,4
LCRC Act: 14-bit En: 14-bit,16-bit,48-bit Mgmt: True
LID: 0x00000005-0x00000005 SM LID: 0x00000004 SL: 0
QSFP Copper, 1m FCI Electronics P/N 10142057-2010LF Rev C
Xmit Data: 0 MB Pkts: 175
Recv Data: 0 MB Pkts: 173
Link Quality: 5 (Excellent)
If I set export HFI_SYSFS_PATH=/sys/class/infiniband/opap59s0
in my environment, libfabric gets further but still fails to create an endpoint.
libfabric:3721:1715881865::psm2:core:psmx2_fabric():90<info>
libfabric:3721:1715881865::core:core:fi_fabric_():1504<info> Opened fabric: psm2
libfabric:3721:1715881865::psm2:domain:psmx2_domain_open():356<info>
libfabric:3721:1715881865::psm2:core:fi_param_get_():373<info> variable lock_level=<not set>
libfabric:3721:1715881865::psm2:core:psmx2_init_tag_layout():171<info> use tag60: tag_mask: 0FFFFFFFFFFFFFFF, data_mask: FFFFFFFF
libfabric:3721:1715881865::core:core:ofi_shm_map():173<warn> shm_open failed
libfabric:3721:1715881865::psm2:av:psmx2_av_open():1121<warn> failed to map shared AV: FI_NAMED_AV_-1
libfabric:3721:1715881865::psm2:core:psmx2_trx_ctxt_alloc():298<info> uuid: 00FF00FF-0000-0000-0000-00FF00FF00FF
libfabric:3721:1715881865::psm2:core:psmx2_trx_ctxt_alloc():303<info> ep_open_opts: unit=-1 port=0
pmrs-gpu-240-02.3721PSM2 no hfi units are active (err=23)
libfabric:3721:1715881865::psm2:core:psmx2_trx_ctxt_alloc():316<warn> psm2_ep_open returns 23, errno=2
This is an issue with the RDMA user library. There is a long drawn out argument on the mailing list about this. I will link at the bottom, but to save you the time:
Create/edit /etc/udev/rules.d/rdma-perisistent-naming-rules:
ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
Also to make you aware, running psm2 natively is a much better way to run. What you have is libfabric linking in and doing a shim between it and libpsm2. Ideally you could run with the native Omni-Path provider in libfabric. This is "OPX". Let me know if you want help doing either of these.
Is the naming fix required for OPX or can we run without it?
Is the naming fix required for OPX or can we run without it?
OK I just tested and OPX works on the same node without any modifications. This is good to know.
I'm gonna direct that question to @charlesshereda or one of his crew.
Is the naming fix required for OPX or can we run without it?
OK I just tested and OPX works on the same node without any modifications. This is good to know.
I spoke too soon. I can create an OPX endpoint on my machine, but I can't actually communicate from the looks of it.
libfabric:3780:1715891670::opx:fabric:opx_sysfs_port_open():275<warn> Offending file name: /sys/class/infiniband/hfi1_1/ports/1/state
libfabric:3780:1715891670::opx:fabric:opx_hfi_get_port_active():463<warn> Failed to get logical link state for unit 1:1: No such file or directory
libfabric:3780:1715891670::opx:ep_data:fi_opx_init_hfi_lookup():299<warn> No LID found for HFI unit 1 of 2 units: ret = -2, No such file or directory.
I'm a little behind on everything but I'll either take a look at this or have someone else look next week.
Hi @raffenet, Have you tried using the udev rule @ddalessa suggested in his update?
Create/edit /etc/udev/rules.d/rdma-perisistent-naming-rules: ACTION=="add", SUBSYSTEM=="infiniband", KERNEL!="hfi1*", PROGRAM="rdma_rename %k NAME_FALLBACK"
OPX does not support HFI_SYSFS_PATH and is hardcoded to use /sys/class/infiniband/hfi1_x. Is there a requirement such that the udev rule will not work long term?
libpsm2 looks for sysfs entries under the path /sys/class/infiniband/hfi1_x. With rdma-core v24.0, the device is renamed according to its device type, PCI bus and device, a la "predictable interface names". This is described at https://patchwork.kernel.org/cover/10870443/ .
On my host, the sysfs path for hfi1_0 is /sys/class/infiniband/opap129s. Thus, libpsm2 fails to find the hfi1_0 sysfs entry in hfi_sysfs_port_open.
The behavior can be observed by executing fi_info on a Debian sid/bullseye host with libfabric-bin and libpsm2-2 installed. The psm2 providers will not be listed in the output. Debug output indicates that no active psm2 device is found.
I have found two orthogonal workarounds for this problem:
HFI_SYSFS_PATH=/sys/class/infiniband/opap129s fi_info
. The "129" portion of the HFI_SYSFS_PATH value needs to be set according to the PCI bus of the HFI card.ACTION=="add", SUBSYSTEM=="infiniband", PROGRAM="rdma_rename %k NAME_KERNEL"
While there is a workaround, libpsm2 should address the new, default RDMA device naming scheme. opa_sysfs.c:sysfs_init() looks like the place to start.