alexlovelltroy commented 7 months ago

Managing Unique Names in ochami Microservices

Context

In the Ochami system (Open Composable Heterogeneous Adaptive Management Infrastructure), there's a need for a flexible and extensible system to identify and manage various components. Components in an Ochami system represent hardware pieces and can range from Nodes to replaceable server parts, doors, and cooling units. This identification system needs to accommodate the system's extensibility and diverse component types.

Decision

We propose the following structure for managing component identities in Ochami:

SMD as the Central Authority: The Service Management Database (SMD) will be the authoritative inventory and state management database for all components. It will use a private UUID identifier for each component internally.

Multiple Identifier Systems:

XNAME: A globally unique string encoding component type and physical location, used across the system for consistent identification.
IP Address and MAC Address: For network-related components, IP addresses and MAC addresses will be used. SMD will maintain the mapping of these addresses to components.
Flexible Component Registration: The system must support the addition of a component for which not all identifiers (XNAME, MAC, IP address) are available initially. Such components can be registered in the system by an inventory discovery service and will be made available for use by other microservices once the necessary identifiers are added.

Consequences

Consistency and Flexibility: Standardizing on a central system (SMD) with multiple identifiers ensures consistency while providing flexibility to accommodate different component types and needs.
Scalability: Assigning SMD the responsibility of maintaining name mappings allows the system to scale effectively.
Complexity Management: The use of a central database to manage component identifiers simplifies the overall architecture, despite the inherent complexity of multiple naming systems.
Partial Registrations: Allowing components to be registered with partial information enhances the system's adaptability and ease of integration.
Dependency on SMD: The critical role of SMD in managing component identities underscores the need for its high availability and reliability.
xnames are SMD-specific: The SMD microservice must be responsible for the canonical representation of an xname and the arbiter of structure as well as validity. Any extension of the ochami xname standard must happen through the governance of SMD

Open Questions

Should SMD expose the internal UUID for consumers?

Security Considerations

None

External References

weallcock commented 6 months ago

I am not familiar with SMD and I am new to Ochami. Can someone point me to documentation on SMD? The reason I ask is because generally, a centralized database != scalability so I am curious how that statement is justified.

weallcock commented 6 months ago

What assumptions are being made about IP and MAC addresses? For instance, will the slingshot Algorithmic Mac Addresses (AMA) cause any problems since they can change?

weallcock commented 6 months ago

As to the question about exposing the UUID, I would argue yes unless there is a good reason (security, scalability) not to. You can't envision every possible scenario, so just make it available if people want it. If it is EVER the only identifier listed, say in an SMD log message, then this becomes pretty much required. Thats my take anyway.

alexlovelltroy commented 6 months ago

Thanks for the comments. SMD is a microservice that was built and released as part of HPE's system management suite for large HPC systems. It was open-sourced and now is a part of OpenCHAMI. At the moment, OpenCHAMI's version is a fork of the HPE version, but we hope to reconcile the changes ASAP.

You raise a good point about a centralized database. Generally speaking, a central database will become a single point of failure for a data-driven system. In CSM, all databases are clustered and deployed to be resilient against the failure of any physical node. I think we'll head that way with LANL's ochami deployment recip(ies) as well. Having said that, the SMD data doesn't change very frequently and even the state of thousands of nodes can be cached locally for most use cases. Rather than making the datastore more performant and accessible, I'd like to see us gain reliability through good contracts and development practices that encourage the use of robust client-side caches.

I'm not worried about the ip and MAC changes at the moment. Those are mutable in the system and unlikely to change frequently. With alternate pseudo-unique ids to work with that are appropriate for each use case, we shouldn't need to worry. There are edge cases around MAC changes on the management network that introduce "ghost" nodes, but those shouldn't be too difficult to work around if/when they arise.

Thanks for the feedback on UUIDs in logs as an argument for exposing them. That's an excellent point!

OpenCHAMI / roadmap

[RFD] Managing Unique Names in ochami microservices #10

Managing Unique Names in ochami Microservices

Context

Decision

SMD as the Central Authority: The Service Management Database (SMD) will be the authoritative inventory and state management database for all components. It will use a private UUID identifier for each component internally.

Multiple Identifier Systems:

Consequences

Open Questions

Security Considerations

External References