NTB Virtual Ethernet - No Link After Reboot

msmith626 commented 5 years ago

Hi,

I'm testing this switchtec-kernel driver (release_4.13_to_4.14 branch, latest commit in that branch, along with commits 68cee85 and 815977d from the master branch). I have a two node CiB and utilize the NTB virtual Ethernet device for inter node communication. And I'm using Linux 4.14.91.

I've noticed when testing with this driver, if the systems (two nodes in one box) are under high load (lots of I/O), when rebooting one of the nodes, when it boots back into the operating system, the NTB virtual Ethernet device is never able to obtain link.

On "node B" the standing node, under high I/O load, then the opposing node was rebooted gracefully; and these kernel messages appeared on this standing node: --snip-- [ 739.970461] cls 0000:3b:00.1: qp 0: Link Cleanup [ 905.993444] switchtec switchtec0: reinitialize shared memory window [ 906.098024] switchtec switchtec0: ntb link forced down [ 906.098048] cls 0000:3b:00.1: qp 0: Link Cleanup [ 1124.884917] clocksource: timekeeping watchdog on CPU11: Marking clocksource 'tsc' as unstable because the skew is too large: [ 1124.884926] clocksource: 'hpet' wd_now: 49bcb158 wd_last: 49029556 mask: ffffffff [ 1124.884928] clocksource: 'tsc' cs_now: ab65762babb0 cs_last: ab653ac5abae mask: ffffffffffffffff [ 1124.884937] tsc: Marking TSC unstable due to clocksource watchdog [ 1125.108850] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. [ 1125.108852] sched_clock: Marking unstable (1125108870996, -40386)<-(1125510673848, -401843263) [ 1125.111836] clocksource: Switched to clocksource hpet --snip--

On "node A" which is the node that was rebooted gracefully, as it booted up, I saw these messages when it loaded the Switchtec driver and NTB virtual Ethernet device module: --snip-- [ 56.784568] cls: loading out-of-tree module taints kernel. [ 56.788574] Athena Host Driver Version 0.1.4 [ 56.791669] switchtec switchtec0: Management device registered. [ 56.794631] switchtec switchtec1: Management device registered. [ 56.794728] cls: loaded. [ 56.900119] switchtec switchtec0: Using crosslink configuration [ 57.316603] switchtec switchtec0: ntb link up [ 57.316639] switchtec switchtec0: NTB device registered [ 57.322839] cls 0000:3b:00.1: NTB Transport QP 0 created [ 57.323300] cls 0000:3b:00.1: eth0 created --snip--

And when that node continued the boot process with init, it attempted to bring up the NTB virtual Ethernet device interface (eth0): ifconfig eth0 up

And we see this in the kernel logs: --snip-- [ 57.481302] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready --snip--

And this can be seen on the standing node as well, no more traffic passes on this Ethernet device. I tried doing a "ifconfig eth0 down" and a "ifconfig eth0 up" on each, but that never made the link re-establish.

I also attempted unloading the "ntb_netdev" module on both, then reloading it.

Finally, I unloaded the "ntb_hw_switchtec" module and re-loaded it on both, then the Ethernet interface was able to come up and function properly. Note: The I/O load was stopped when I attempted to do this, so neither system was no longer under I/O load.

Any help would be greatly appreciated.

Thanks,

Marc

ghost commented 5 years ago

Hi @msmith626

Thanks for the testing and update. The node B's NTB link does not up. Can you add some logs in driver?

Thanks Joey

msmith626 commented 5 years ago

Thank you Joey; I haven't enabled any monitoring lines yet in the driver, but I started with enabling dynamic debug and watched the behavior for a normal link down/up on reboot, and when the link fails to come up.

I didn't notice any difference on the node that boots up (after just rebooting) on initialization, but there is a clear difference on the "standing" node (the one that didn't reboot)...

Normal behavior, link comes back up as expected: [ 1643.941478] switchtec switchtec0: message: 0 00000004 [ 1643.941864] switchtec switchtec0: reinitialize shared memory window [ 1643.944478] switchtec switchtec0: message: 0 00000001 [ 1644.046095] switchtec switchtec0: ntb link forced down [ 1644.046499] switchtec switchtec0: ntb link up [ 1644.046820] cls 0000:3b:00.1: qp 0: Link Cleanup [ 1644.459220] switchtec switchtec0: doorbell [ 1644.465082] cls 0000:3b:00.1: qp 0: Link Up

Abnormal behavior, the link does NOT come back up: [ 990.799398] switchtec switchtec0: message: 0 00000004 [ 990.874732] switchtec switchtec0: reinitialize shared memory window [ 991.174544] switchtec switchtec0: ntb link forced down [ 991.174836] cls 0000:3b:00.1: qp 0: Link Cleanup

When behaving abnormally where the link doesn't come up, we are missing the "switchtec0: message: 0 00000001" message (MSG_LINK_UP).

I'm not sure if the message isn't being transmitted at all from the node that comes up, or if the message isn't being received by the standing node. I suppose I could add some monitoring lines the driver code to find out, but the code that sends that message seems pretty straightforward: `` static int switchtec_ntb_link_enable(struct ntb_dev ntb, enum ntb_speed max_speed, enum ntb_width max_width) { struct switchtec_ntb sndev = ntb_sndev(ntb);

    dev_dbg(&sndev->stdev->dev, "enabling link\n");

    sndev->self_shared->link_sta = 1;
    switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);

    switchtec_ntb_link_status_update(sndev);

    return 0;

} ``

Any tips on where to place monitoring/debug lines in the driver code? Or suggestions on other areas to look at?

Thanks for your time.

--Marc

wesleywesley commented 5 years ago

@msmith626

A question: what's "two nodes in one box".

From you description, seems the reboot node did send out the MSG_LINK_UP, but the standing node did not receive the message.

Would you share the configuration and setup topology for us to reproduce the case.

Regard, Wesley

msmith626 commented 5 years ago

@wesleywesley

"Two nodes in one box" is a CiB (cluster-in-a-box): https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/platforms/ultrastar-serv24-ha-storage-server/data-sheet-ultrastar-serv24-ha-storage-server.pdf

Both nodes connected internally via PCIe to the mid-plane (PCIe switch). Utilizes dual-port NVMe drives for shared storage devices between both servers.

We are using Linux 4.14.91 with the switchtec-kernel driver (release_4.13_to_4.14 branch, latest commit in that branch, along with commits 68cee85 and 815977d from the master branch). And we utilize the "ntb_netdev" driver to provide a virtual Ethernet interface on both nodes, for internal IP communication (eg, Linux cluster stack messages).

On boot of each node, the Switchtec + NTB Virtual Ethernet drivers are loaded, a static IPv4 address is set on each NTB interface (eth0).

Not sure if that is detailed enough, please let me know what else is needed if not.

--Marc

Microsemi / switchtec-kernel

NTB Virtual Ethernet - No Link After Reboot #61