OpenFabrics / sunfish_library_reference

The core Sunfish implementation
BSD 3-Clause "New" or "Revised" License
2 stars 4 forks source link

Handling boundary components #25

Open christian-pinto opened 1 month ago

christian-pinto commented 1 month ago

@rherrell @cayton There are a number of cases where components are at the boundaries of multiple fabrics, or between switches within the same fabric. These components are what we refer to as "Boundary Components".

Let's make the case of a ComputerSystem (a server) connected to a CXL fabric via a switch, see picture below. The CXL switch will be under the control of a dedicated Sunfish Hardware agent. We will call it CXL Agent for the sake of this example. While the host will be managed by its own BMC and/or another Sunfish Hardware Agent. We will call it BMC for the sake of this example.

image

At system startup, the CXL Agent will advertise its own fabric including the one switch and the downstream devices. For the upstream ports, there's little assumption that the agent can do regarding the upstream ports and what is or will be connected to them. This is because 1) There might not be an entity connected to the specific ports. 2) There is an entity connected but it is powered off and therefore no link is detected. 3) The CXL Agent might not be able to identify the object on the other side of the link (i.e. its port). Similarly for the ComputerSystem, the BMC is not aware of what the host port is connected to because: 1) a single host might not have full visibility of the entire fabric it is connected to. 2) The host might initially be powered off and therefore no link is detected on the port.

The only initial assumption that can be done is that the port is the component that sits at the boundary of a physical connection and therefore the one to be used for resolving the physical connection of boundary components at runtime.

One potential approach for resolving these conflicts is to follow the below flow:

  1. Both parties sharing a boundary component register to sunfish and report their resources as usual.
  2. Ports that are either not connected or the connection is not known at the time of the registration of the agent are marked as unresolved. This could be done by extending the Sunfish_RM field we use in the Oem property of each object to mark it with the agent it belongs to. We could add something like the below snippet.
    "Sunfish_RM":{
    "@odata.type": "#SunfishExtensions.v1_0_0.ResourceExtensions",
    "Status" : {
    "State":"unresolved"
    }
  3. Each agent populates PortID field (See redfish schema guide) with a unique identifier that is going to be fabric specific. Examples are MAC address for Etherned, CXL IDs for CXL fabrics, etc. The RemotePortID field is left empty
  4. When Sunfish scans resources from a system or agent, it caches the resources that are unresolved in a special data structure for later processing.
  5. Whenever the state of a port changes. i.e., both parties have booted and the host can "read" the unique id of he port at the other side of the link (e.g., the switch port). The agent updates its own version of the port object by populating the RemotePortID with the new port ID discovered. The agent sends an event to sunfish to signal the updated object.
  6. Sunfish at every event update checks whether the object at hand is one of the unresolved ones. This can be done by using the RemotePortID in the updated port for indexing the data structure with all unresolved items. If there's a hit sunfihs updated both ports with the complete information and marks the objects as resolved.

One drawback I see with this approach is that using the PortID for carrying the unique identifier, we lose to possibility of pysically identifying the port, which is most probaly the reason why the PortID field is there.

From the redfish spect I read

image

christian-pinto commented 1 month ago

Related info on RedFish ports: https://redfish.dmtf.org/schemas/v1/Port.v1_7_0.yaml

rherrell commented 1 month ago

The original problem statement is pretty clear. The discussion about a possible solution is incomplete.

Border components are components attached to boundary links, and by definition managed by different first level hardware managers. Boundary links are links that cross between two different manager domains. For example, the ComputerSystem and the Switch in the figure under discussion have different managers, but these two components are connected by a link of the CXL fabric. The link between them is a boundary link, and the two components are thus boundary components. It is the boundary link that is in common between the two managers' Redfish models of their managed domains. If the ComputerSystem manager supplies a placeholder Redfish model of the CXL Switch and its Port, these would be duplicate objects to the CXL Switch and Port models supplied by the Switch manager.

However, matching up the two Redfish models as the exact components at the appropriate end of a boundary link requires Sunfish (or whatever entity is doing the merger of the two Redfish boundary component models) to know two things about both boundary components, so as to unambiguously locate the boundary link:

1) The physical port number (or port ID) of the boundary link, as known by the boundary component hardware 2) A globally unique component ID of the boundary component hardware sourcing the port connected to the boundary link. Thus, the local component needs to be informed of the Remote Component's Port ID plus the Remote Component's globally unique ID.

The discussion in the original issue does not deal with this required 2nd ID. The discussion about placing these IDs of the remote component in the Port object of the local component needs to address this 2nd ID. We could just add this 2nd ID, which is already available in the CXL fabric (PBR mode) link ports.

However, there are other issues that need to be resolved when aggregating Redfish inventory from multiple Hardware Managers or multiple Agents that I think call for boundary components on the remote side of a boundary link be modeled with placeholder Redfish objects, rather than just have a pair of conclusive IDs installed in boundary link local Port properties.

The latter will work, but actual placeholder components appear to make the merging problem easier for the upper layers. I will have a slide deck that illustrates the boundary component placeholder mechanism ready very soon.