Mellanox / rshim-user-space

Linux based user-space RSHIM driver for the Mellanox BlueField SoC
Other
24 stars 17 forks source link

Revert semantics of --reverse-nc: now the default nc mode is local as server and remote as client #133

Closed pgeng-nv closed 7 months ago

pgeng-nv commented 7 months ago

This is to address issue https://redmine.mellanox.com/issues/3859084 (bfb-install script fails with timeout on remote mode installation).

Background info for the issue:

This is a known issue when using “bfb-install” with netcat mode with default options for the current BMC software.

The default netcat modes (“nc” and “ncpipe”) for “bfb-install” requires:

  1. The remote host (could be BMC or x86 Host) has a netcat version that supports TCP server. The current BMC software has a Busybox based “nc” which doesn’t meet this requirement.
  2. The remote host needs to open up a netcat TCP server port (default 9527) . The current BMC software doesn’t allow this as well for cyber security reasons.

To address these issues, we later added “--reverse-nc” option to set up netcat server on local host instead. For updating BMC with “bfb-install”, this option must be used.

Then the question is “why don’t we just remove the default netcat mode (let’s call it “forward nc”)”?

The answer is that with this “bfb-install” update, we not only introduced speed optimization for BMC BFB update, but also support for general “Remote RSHIM” update, which means the remote RSHIM “server” could be not only BMC but also the PCIe x86 host. For the latter case, the forward nc mode works well, and there could be cases where the reverse mode doesn’t work due to host or network limitations.

Simply put, this is not a bug but a known limitation from BMC side. It can be worked around with “--remote-nc” option.

Suggested actions:

  1. Document the requirement of “--remote-nc” option for BMC more explicitly, and
  2. More intuitive error message for this error, and
  3. If needed, make this “—reverse-nc” as default mode if the major remote RSHIM use case is PC to BMC update, or
  4. If needed, remove the “forward nc” mode altogether if we don’t want to support remote RSHIM for PCIe x86 host.

This commit implements suggestion point 3 above. It togglers the sematics of "reverse". It makes the original "reverse nc" mode as the default mode, and when using "--reverse-nc" it will be the previous "normal" mode.