allenbh / ntrdma-old

Other
4 stars 1 forks source link

NTRDMA over PCIe NTB - ibv_rc_pingpong fail to find devices #1

Open johnnypei opened 8 years ago

johnnypei commented 8 years ago

Hello Allen,

I really appreciate your work on NTRDMA driver. I go through all the instruction at https://github.com/allenbh/ntrdma/wiki to setup NTRDMA software stack on my HW platform (a 2U-2N Intel-based server with NTB connection between two nodes). Following is some Platform information I got after all:

Processor: Intel(R) Xeon(R) CPU E5-2658 v4@ 2.30GHz [root@localhost ~]# lspci -s 00:03.0 00:03.0 Bridge: Intel Corporation Device 6f0d (rev 01) [root@localhost ~]# uname -a Linux localhost.localdomain 4.3.0+ #1 SMP Tue Jul 19 04:34:29 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux [root@localhost ~]# cat /sys/class/infiniband_verbs/uverbs0/device/vendor 0x8086 [root@localhost ~]# cat /sys/class/infiniband_verbs/uverbs0/device/device 0x6f0d [root@localhost ~]# cat /sys/class/infiniband/ntrdma_0/device/vendor 0x8086 [root@localhost ~]# cat /sys/class/infiniband/ntrdma_0/device/device 0x6f0d [root@localhost ~]# lsmod | grep ntb ntc_ntb_msi 24576 0 ntc_phys 16384 1 ntc_ntb_msi ntc 16384 2 ntc_ntb_msi,ntrdma ntb_hw_intel 40960 0 ntb 16384 2 ntc_ntb_msi,ntb_hw_intel [root@localhost ~]# lsmod | grep rdma rpcrdma 81920 0 rdma_ucm 24576 0 rdma_cm 45056 4 rpcrdma,ib_iser,rdma_ucm,ib_isert ib_cm 45056 5 rdma_cm,ib_srp,ib_ucm,ib_srpt,ib_ipoib iw_cm 45056 1 rdma_cm ib_sa 36864 5 rdma_cm,ib_cm,ib_srp,rdma_ucm,ib_ipoib ib_uverbs 53248 2 ib_ucm,rdma_ucm ntrdma 94208 0 ntc 16384 2 ntc_ntb_msi,ntrdma ib_core 110592 16 rdma_cm,ib_cm,ib_sa,iw_cm,rpcrdma,ib_mad,ib_srp,ib_ucm,ntrdma,ntc_phys,ib_iser,ib_srpt,ib_umad,ib_uverbs,ib_ipoib,ib_isert ib_addr 16384 3 rdma_cm,ib_core,rdma_ucm sunrpc 327680 8 nfsd,auth_rpcgss,lockd,rpcrdma,nfs_acl

[root@localhost ~]# ibv_rc_pingpong No IB devices found

As you can see, even though the IB Verb device was created successfully, it's unable to be found by "ibv_rc_pingpong". Do you have any idea?

allenbh commented 8 years ago

The correct modules seem to be loaded, except that the commands above would not have shown ioatdma. Can you show me what module parameters have been used.

head /sys/module/ioatdma/parameters/*
head /sys/module/ntb_hw_intel/parameters/*

Show that all the expected devices are actually being registered in kernel space.

ls /sys/bus/ntb/devices
ls /sys/bus/ntc/devices
head /sys/class/dma/*/in_use
ls /sys/class/infiniband
head /sys/class/infiniband_verbs/*/ibdev

Finally, lets double check that libibverbs and libntrdma are installed correctly.

ls /etc/libibverbs.d
ls /usr/lib64/libibverbs*
ls /usr/lib64/libntrdma*
which ibv_rc_pingpong
allenbh commented 8 years ago

Also, check the hardware configuration of the ntb device. I am interested to see the memory window sizes and addresses.

sudo head -n100 /sys/kernel/debug/ntb_hw_intel/*/info
allenbh commented 8 years ago

The fact that you see entries in /sys/class/infiniband_verbs, and the ibv examples find no devices, leads me to suspect the issue is with the user space libraries.

johnnypei commented 8 years ago

Hi Allen, Please check the command output below:

**[root@localhost ~]# head /sys/module/ioatdma/parameters/*** ==> /sys/module/ioatdma/parameters/ioat_dca_enabled <== 1 ==> /sys/module/ioatdma/parameters/ioat_interrupt_style <== msix ==> /sys/module/ioatdma/parameters/ioat_pending_level <== 4 ==> /sys/module/ioatdma/parameters/ioat_ring_alloc_order <== 8 ==> /sys/module/ioatdma/parameters/ioat_ring_max_alloc_order <== 16

**[root@localhost ~]# head /sys/module/ntb_hw_intel/parameters/*** ==> /sys/module/ntb_hw_intel/parameters/b2b_mw_idx <== 0 ==> /sys/module/ntb_hw_intel/parameters/b2b_mw_share <== 1 ==> /sys/module/ntb_hw_intel/parameters/no_msix <== 1 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_dsd_bar2_addr64 <== 11529215046068469760 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_dsd_bar4_addr32 <== 2684354560 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_dsd_bar4_addr64 <== 0 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_dsd_bar5_addr32 <== 3221225472 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_usd_bar2_addr64 <== 2305843009213693952 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_usd_bar4_addr32 <== 536870912 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_usd_bar4_addr64 <== 0 ==> /sys/module/ntb_hw_intel/parameters/xeon_b2b_usd_bar5_addr32 <== 1073741824

[root@localhost ~]# ls /sys/bus/ntb/devices 0000:00:03.0 [root@localhost ~]# ls /sys/bus/ntc/devices 0000:00:03.0

*[root@localhost ~]# head /sys/class/dma//in_use* ==> /sys/class/dma/dma0chan0/in_use <== 1 ==> /sys/class/dma/dma10chan0/in_use <== 0 ==> /sys/class/dma/dma11chan0/in_use <== 0 ==> /sys/class/dma/dma12chan0/in_use <== 0 ==> /sys/class/dma/dma13chan0/in_use <== 0 ==> /sys/class/dma/dma14chan0/in_use <== 0 ==> /sys/class/dma/dma15chan0/in_use <== 0 ==> /sys/class/dma/dma1chan0/in_use <== 0 ==> /sys/class/dma/dma2chan0/in_use <== 0 ==> /sys/class/dma/dma3chan0/in_use <== 0 ==> /sys/class/dma/dma4chan0/in_use <== 0 ==> /sys/class/dma/dma5chan0/in_use <== 0 ==> /sys/class/dma/dma6chan0/in_use <== 0 ==> /sys/class/dma/dma7chan0/in_use <== 0 ==> /sys/class/dma/dma8chan0/in_use <== 0 ==> /sys/class/dma/dma9chan0/in_use <== 0 [root@localhost ~]# ls /sys/class/infiniband ntrdma_0 [root@localhost ~]# head /sys/class/infiniband_verbs//ibdev ntrdma_0

[root@localhost ~]# ls /etc/libibverbs.d ipath.driver [root@localhost ~]# ls /usr/lib64/libibverbs* /usr/lib64/libibverbs.so /usr/lib64/libibverbs.so.1 /usr/lib64/libibverbs.so.1.0.0 [root@localhost ~]# ls /usr/lib64/libntrdma ls: cannot access /usr/lib64/libntrdma: No such file or directory [root@localhost ~]# which ibv_rc_pingpong /usr/bin/ibv_rc_pingpong [root@localhost ~]#

*[root@localhost ~]# sudo head -n100 /sys/kernel/debug/ntb_hw_intel//info** NTB Device Information: Connection Topology - NTB_TOPO_B2B_DSD B2B MW Idx - 0 B2B Offset - 0x200000 BAR4 Split - yes NTB CTL - 0x0000 LNK STA - 0xf083 Link Status - Up Link Speed - PCI-E Gen 3 Link Width - x8 Memory Window Count - 3 Scratchpad Count - 16 Doorbell Count - 15 Doorbell Vector Count - 1 Doorbell Vector Shift - 16 Doorbell Valid Mask - 0x7fff Doorbell Link Mask - 0x8000 Doorbell Mask Cached - 0x7fff Doorbell Mask - 0x7fff Doorbell Bell - 0x0

NTB Incoming XLAT: XLAT23 - 0x0000000000000000 XLAT4 - 0x0000 XLAT5 - 0x0000 LMT23 - 0x2000000000000000 LMT4 - 0x20000000 LMT5 - 0x0000

NTB Outgoing B2B XLAT: B2B XLAT23 - 0xa000000000000000 B2B XLAT4 - 0xa0000000 B2B XLAT5 - 0xc0000000 B2B LMT23 - 0x0000000000000000 B2B LMT4 - 0x0000 B2B LMT5 - 0x0000

NTB Secondary BAR: SBAR01 - 0x200000000000000c SBAR23 - 0x200000000000000c SBAR4 - 0x20000000 SBAR5 - 0x40000000

XEON NTB Statistics: Upstream Memory Miss - 554

XEON NTB Hardware Errors: DEVSTS - 0x0000 LNKSTS - 0xf083 UNCERRSTS - 0x100000 CORERRSTS - 0x11c1

Does that mean the '/usr/lib64/libntrdma' wasn't installed correctly?

allenbh commented 8 years ago

Yes, this is indicative of libntrdma user space driver not being installed (or, not in the correct location). Please refer to quick start: user space library and dependencies.

[root@localhost ~]# ls /etc/libibverbs.d
ipath.driver

[root@localhost ~]# ls /usr/lib64/libibverbs*
/usr/lib64/libibverbs.so /usr/lib64/libibverbs.so.1 /usr/lib64/libibverbs.so.1.0.0

[root@localhost ~]# ls /usr/lib64/libntrdma*
ls: cannot access /usr/lib64/libntrdma*: No such file or directory

[root@localhost ~]# which ibv_rc_pingpong
/usr/bin/ibv_rc_pingpong

I think you may also have trouble with memory windows in the current configuration. The ntc_ntb_msi driver will use the last available ntb memory window. I think the default configuration is often only a few MiB large, and is anyway limited to 32 bits in split BAR configuration. The output you provided reminds me that BARxxSZ registers would be useful to add to the ntb info in debugfs.

BAR4 Split - yes

Reconfigure the ntb in your bios settings, disable SPLIT BAR to make BAR45 a single 64 bit memory aperture instead of the two of 32 bit.

Also, look for a setting to configure the bar size (named PBAR45SZ, or something like that), and increase the number of address bits to accommodate the total physical memory installed on your system. Something like 39 address bits, or 512GiB should be plenty large enough.

Don't forget to change the bios configuration on both sides.

edit: fixed broken links

johnnypei commented 8 years ago

Hi Allen,

I install the libntrdma and reconfigure the NTB devices with Split bar disabled in BIOS. I also increase the PBAR45SZ and PBAR23SZ size to 30 bits (physical RAM size is 32GB) (please see my system's configuration in the attached file). I check the debugging info and see that NTB link is up:

NTB Device Information: Connection Topology - NTB_TOPO_B2B_USD B2B MW Idx - 0 B2B Offset - 0x20000000 BAR4 Split - no NTB CTL - 0x0000 LNK STA - 0xb083 Link Status - Up Link Speed - PCI-E Gen 3 Link Width - x8 Memory Window Count - 2 Scratchpad Count - 16 Doorbell Count - 15 Doorbell Vector Count - 1 Doorbell Vector Shift - 16 Doorbell Valid Mask - 0x7fff Doorbell Link Mask - 0x8000 Doorbell Mask Cached - 0x7fff Doorbell Mask - 0x7fff Doorbell Bell - 0x0

However, when I run IB pingpong, it hangs at this line forever without doing anything (from both sides):

[root@localhost src]# ibv_rc_pingpong ntrdma_driver_init("/sys/class/infiniband_verbs/uverbs0", 1)

Besides, Ethernet interfaces (eth0) cannot be pinged from each other. Could it be anything wrong with the NTB configurations? sys_info_log.txt

edit: fixed broken links

allenbh commented 8 years ago

The configuration looks good now, and now the libntrdma library is being loaded, and ibv_rc_pingpong is finding the devices. The issue about not finding devices is resolved, but let's keep going until we get the rest working entirely.

However, when I run IB pingpong, it hangs at this line forever without doing anything (from both sides):

[root@localhost src]# ibv_rc_pingpong ntrdma_driver_init("/sys/class/infiniband_verbs/uverbs0", 1)

Did you run ibv_rc_pingpong as the server on both sides? One side needs to be run as the client, as in ibv_rc_pingpong <ip-addr-of-peer>.

Besides, Ethernet interfaces (eth0) cannot be pinged from each other.

I assume you configured ip addresses. Let's now check the initialization state of the ntrdma driver. We are looking for foo_ready = 1 and foo_enable = 1 and eth_link = 1.

head -n100 /sys/kernel/debug/ntrdma/*/info
allenbh commented 8 years ago

Also, if it is enabled, try disabling VT-d in processor virtualization settings, in BIOS. If that alone makes a difference, I would like to know.

johnnypei commented 8 years ago

OK. Here is the ntrdma debugging info: ntrdma_db.txt I didn't see _fooready and _fooenable exist, but I can see that _ethlink=0 which may indicate an issue I guess. The VT-d was already disabled in BIOS and IP address of two nodes were set to 192.168.1.1 (server) and 192.168.1.2 (client).

ibv_rc_pingpong-1

edit: fixed broken links

allenbh commented 8 years ago

The ntrdma driver has not negotiated a connection with its peer, so none of its services are ready.

vbell_enable 0
cmd_ready 0
res_enable 0
eth_enable 1
eth_ready 0
eth_link 0

If the ip address is associated with the ntrdma net device, and the ntrdma net device is not ready, then ibv_rc_pingpong client will not be able to connect to ibv_rc_pingpong server using that address. Eventually, the client should say something like "connection timed out."

I think the next thing to try is to unload the ntc_ntb_msi driver, and instead load ntb_tool. With ntb_tool, we can check the basic functionality of the ntb. Can we write the scratchpad on the peer, and then can the peer read what was written?

rmmod ntc_ntb_msi
modprobe ntb_tool

DBG_DIR=/sys/kernel/debug/ntb_tool/0000:00:03.0

server-a# cat $DBG_DIR/spad

server-b# echo '0 0x01010101 1 0x7f7f7f7f' > $DBG_DIR/peer_spad

server-a# cat $DBG_DIR/spad
johnnypei commented 8 years ago

Hi Allen, It looks like read/write on scratchpad register were successful. Please see the output on server A before and after write on server B:

[root@localhost src]# cat $DBG_DIR/spad 0 0x0 1 0x0 2 0x0 3 0x0 4 0x0 5 0x0 6 0x0 7 0x0 8 0x0 9 0x0 10 0x0 11 0x0 12 0x0 13 0x0 14 0x0 15 0x0 [root@localhost src]# cat $DBG_DIR/spad 0 0x1010101 1 0x7f7f7f7f 2 0x0 3 0x0 4 0x0 5 0x0 6 0x0 7 0x0 8 0x0 9 0x0 10 0x0 11 0x0 12 0x0 13 0x0 14 0x0 15 0x0

Only one strange thing I noticed is that after executing 'echo' command on server B, its console was hanged:

echo '0 0x01010101 1 0x7f7f7f7f' > $DBG_DIR/peer_spad ^C

^C

allenbh commented 8 years ago

Sorry about that hang in ntb_tool. It will be fixed upstream in Linux 4.8. The patch is here if you are interested: jonmason/ntb@bfc43eb3790a6f4f38adb6769f220df942024b63. But the output already shows that the scratchpads are working. To get past the ntb_tool hang, for now just reboot.

Try loading the ntrdma driver again, but this time with debugging enabled.

sysctl kernel.printk 8
modprobe ntc_ntb_msi dyndbg=+pm
modprobe ntrdma dyndbg=+pm
# a few seconds after loading modules on both sides
dmesg
johnnypei commented 8 years ago

The patch works just fine, no more hang. Here is the logs I got after loading modules: ntrdma_load_a.log.txt ntrdma_load_b.log.txt And the configurations on both sides: sys_info_a.log.txt sys_info_b.log.txt

edit: fixed broken links

allenbh commented 8 years ago

The signature of the failure on your system is this:

[  523.859023] ntc_ntb_msi: ntrdma 0000:00:03.0: peer msg 4
...
[  523.859101] ntc_ntb_msi: ntrdma 0000:00:03.0: ping send msg 5
...
[  524.117385] ntc_ntb_msi: ntrdma 0000:00:03.0: peer msg 0
[  524.117423] ntc_ntb_msi: ntrdma 0000:00:03.0: link error

or this:

[  524.892481] ntc_ntb_msi: ntrdma 0000:00:03.0: peer msg 5
...
[  524.892527] ntc_ntb_msi: ntrdma 0000:00:03.0: ping send msg 6
...
[  525.150853] ntc_ntb_msi: ntrdma 0000:00:03.0: peer msg 0
[  525.150890] ntc_ntb_msi: ntrdma 0000:00:03.0: link error

In the transition from state 4 to 5, and for all states higher than 5, the ntc_ntb_msi driver expects to see the next peer state in memory instead of the ntb scratchpad. Instead, it is reading the state zero from memory. I think there is a problem writing to peer memory across the ntb, and ntc_ntb_msi is detecting the failure and attempting to reset the link.

I think this failure indicates a hardware configuration issue.

There are changes to ntb_tool and a new test script pending to be merged in Linux 4.8, in the same branch as the the earlier fix to ntb_tool. Can you either build and install the kernel out of jonmason/ntb ntb-next branch, or if you are comfortable you can cherry-pick those patches into your local ntrdma version of the kernel. Make sure all the NTB modules are selected, build and install, and then run the script in tools/testing/selftests/ntb/ntb_test.sh. One of the tests in the script verifies the operation of NTB memory windows.


Here are descriptive names for the states:

#define NTC_NTB_LINK_QUIESCE            0
#define NTC_NTB_LINK_RESET_REQ          1
#define NTC_NTB_LINK_RESET_ACK          2
#define NTC_NTB_LINK_INIT_SPAD          3
#define NTC_NTB_LINK_INIT_INFO          4
#define NTC_NTB_LINK_PING_READY         5
#define NTC_NTB_LINK_PING_COMMIT        6
#define NTC_NTB_LINK_HELLO              7

This is what the normal state transition should look like. Notice that in the transition from 6 to 7, ntc_ntb_msi has initialized to where it can provide full service to the upper layer, and ntc_ntb_msi starts calling to the upper layer driver ntrdma for its initialization phases.

[847462.683890] ntc_ntb_msi:ntc_ntb_ping_send: ntrdma 0000:00:03.0: ping send msg 5
[847462.692152] ntc_ntb_msi:ntc_ntb_ping_commit: ntrdma 0000:00:03.0: link ping commit
[847462.700705] ntc_ntb_msi:ntc_ntb_ping_send: ntrdma 0000:00:03.0: ping send msg 6
[847463.092508] ntc_ntb_msi:ntc_ntb_ping_poll_cb: ntrdma 0000:80:03.0: peer msg 7
[847463.100601] ntc_ntb_msi:ntc_ntb_ping_start: ntrdma 0000:80:03.0: ping start
[847463.108485] ntc_ntb_msi:ntc_ntb_link_work: ntrdma 0000:80:03.0: link work state 6 event 7
[847463.117725] ntc_ntb_msi:ntc_ntb_hello: ntrdma 0000:80:03.0: link hello phase 0
[847463.125898] ntc:ntc_ctx_hello: ntrdma 0000:80:03.0: hello phase 0
[847463.132804] ntrdma:ntrdma_dev_hello: ntrdma 0000:80:03.0: hello phase 0
[847463.140295] ntc_ntb_msi:ntc_ntb_hello: ntrdma 0000:80:03.0: successful hello callback
[847463.149144] ntc_ntb_msi:ntc_ntb_hello: ntrdma 0000:80:03.0: prepare for next phase
[847463.157701] ntc_ntb_msi:ntc_ntb_prep_hello: ntrdma 0000:80:03.0: link prep hello
[847463.166062] ntc_ntb_msi:ntc_ntb_ping_send: ntrdma 0000:80:03.0: ping send msg 7
allenbh commented 8 years ago

HI @johnnypei. I want to be sure you can still find this issue, since the url has changed. I renamed allenbh/ntrdma to allenbh/ntrdma-old (this issue is still open under ntrdma-old). Now, allenbh/ntrdma is a fork of ntrdma/ntrdma. I also moved the wiki over to ntrdma/ntrdma/wiki.

Over the weekend I merged with the latest release version of Linux (v4.7). I also created a branch for you, called with-ntb-next-for-v4.8, with ntb patches that are pending for the next release (v4.8), so you don't have to do all that cherry-picking I suggested in the previous comment.

Thanks for motivating me to get this work done. The ntrdma driver was four releases behind upstream Linux, and it was well past time for me to merge downstream.

Let me know the results of the ntb selftest on your hardware.

johnnypei commented 8 years ago

Hi Allen, Thanks for your effort on it. I'm gonna try the source code soon and let you know the result.

allenbh commented 8 years ago

@johnnypei, In addition to the ntb self test results, may I ask what hardware you have, such as vendor and any cable or PCIe riser or signal redriver card you may have installed?