CodeConstruct / mctp

MCTP userspace tools
GNU General Public License v2.0
32 stars 19 forks source link

Support DiscoveryNotify Handling #53

Open santoshpuranik opened 1 week ago

santoshpuranik commented 1 week ago

I am starting this thread to discuss how best to handle the DiscoveryNotify message as defined in DSP0236 within mctpd where mctpd is the bus owner on the interface that receives the DiscoveryNotify event.

Per the MCTP spec, DiscoveryNotify may be sent by an endpoint as either a request or a datagram, and is used to announce the hotplug of the device to the bus owner. For cases where the message is emnating from a non-bridge endpoint, this message may be used by the BO to perform EID assignment and discovery (Get MCTP types/UUID) on the new endpoint.

Not explicitly mentioned, but a bridge endpoint may send a DiscoveryNotify to the BO to notify the BO of routing table changes (such as a new endpoint getting added/removed downstream to the bridge). In reaction to the discovery notify sent by a bridge, the BO should re-query the routing table from the bridge.

Does the above flow make sense from the perspective of a bus owner? Is this something that can be implemented by mctpd?

amboar commented 1 week ago

Not explicitly mentioned, but a bridge endpoint may send a DiscoveryNotify to the BO to notify the BO of routing table changes (such as a new endpoint getting added/removed downstream to the bridge). In reaction to the discovery notify sent by a bridge, the BO should re-query the routing table from the bridge.

What section(s) of DSP0236 do you feel substantiate this?

santoshpuranik commented 1 week ago

Not explicitly mentioned, but a bridge endpoint may send a DiscoveryNotify to the BO to notify the BO of routing table changes (such as a new endpoint getting added/removed downstream to the bridge). In reaction to the discovery notify sent by a bridge, the BO should re-query the routing table from the bridge.

What section(s) of DSP0236 do you feel substantiate this?

They don't, explicitly. This was something suggested by the editor of DSP0236 and we have bridges that implement this command to indicate endpoints downstream of the bridge falling off the network. I can ask for it to be added to the spec. itself.

I am not a 100% sure that the control daemon should directly re-query the routing table on its own accord upon receiving the DiscoveryNotify message, but do you think it is reasonable for it to emit it as a signal on the bus owner D-Bus interface?

An external entity such as the reactor could use this signal as a trigger to relearn the endpoint?

amboar commented 1 week ago

So, I think it's important to keep in mind that while a given node in the network will have enough information to route a packet to a given EID, there's no signalling mechanism for that EID being present in the network beyond its immediate bus owner. Essentially, when a BO (let's call it A) allocates a range of EIDs to a local bridge device (B), it adds the routes to its local route table, but it has no visibility below the bridge as to whether or how they're assigned to some device below the bridge (C, say).

Put another way, there's no strict global consistency for the route model of the network topology: Each node may have a different shape to its route table, based on what it knows from its perspective in the network.

There's some more exploration of this in a discussion with the Ampere folk on the OpenBMC discord

Further, MCTP is defined to drop packets with no notification to the sender on error. From DSP0236 v1.3.3, 8.7:

  • Un-routable EID An MCTP bridge receives an EID that the bridge is not able to route (for example, because the bridge did not have a routing table entry for the given endpoint).

Together, I think it's an extra complication that's (currently) outside the spec to try to back-propagate route table changes below a bridge to the bridge's own BO, e.g. by repurposing Discovery Notify. Rather, A always routes packets for C to B based on its route table entry that exists due to the EID pool allocation it (A) gave to B, and B drops the packets for C if it knows that no allocation has been made for C.

I think the only flow that's necessary upon receiving Discovery Notify is for the BO to issue Endpoint Discovery in response (though, not necessarily excluding a prior Prepare for Endpoint Discovery, but I don't yet see that it should be necessary).